# Flow zone presentation
 
The [district_square_meter_prices_time_feature_engineering](flow_zone:D2MOiLw) zone preprocess the square meter price data to transform it into an exploitable time series. 
 

![district-square-meter-prices-time.png](3OcGbnVNd6F9)


First, to forecast the evolution of prices we need to have data for each district. As described in the [section 'Dataset |  real_estate_sales'](article:21), our properties transaction granularity is:
> **Dataset Granularity**: 1 row = 1 property (address information) sold on one date.
 
BUT:
- We can't be sure of the transaction for all the dates between 2016 and 2020.
- It is likely (and this is the case) we did not record a sale for all the districts on each date between 20216 and 2020.
 
Our real estate prices time series is broken: thus, we will need to repair it before processing it.
 
- Repairing data on its granularity:
  - 1 : Recipe [compute_average_square_meter_price_per_district](recipe:compute_average_square_meter_price_per_district) computes the **average square meter price at the granularity date+district**. 
  - 2 : Recipes [compute_date_boundaries](recipe:compute_date_boundaries) and [compute_districts_on_each_date](recipe:compute_districts_on_each_date) help in generating the dataset [districts_on_each_date](dataset:districts_on_each_date) that has the granularity **date x census_district** for each couple of dates and districts of the dataset.
  - 3 : Recipe [compute_districts_square_meter_price_on_each_date](dataset:compute_districts_square_meter_price_on_each_date) then joins the information from *1*  and *2* to generate the dataset [districts_square_meter_price_on_each_date](dataset:districts_square_meter_price_on_each_date) that also has both the granularity **date x census_district** and the information of the **average square meter price at this granularity**, except that, as expected, it doesn't have square meter price information for all districts. Here 43.7% of this information is missing.
 
- Repairing data on its content: For this, plugin recipe [compute_square_meter_price_time_series](dataset:compute_square_meter_price_time_series) then applies resampling techniques to the time series so that we have square meter price data for each *date x census district*.
 
- Looking below at the time series we get for each district, we can observe huge price variations. We will smoothen our time series to aggregate it at a **month x census_district**  granularity. At the cost of our number of points, it will however allow us to take into account more transactions to estimate the average square meter price per district, leading to a smoother/more stable time series.
![square-meter-price-daily-per-district.png](jzBk1zFW6Ht3).
 
  - Recipe [compute_square_meter_price_time_series_prepared](recipe:compute_square_meter_price_time_series_prepared) extract the year and month information from the transaction dates so that we can aggregate the average square meter price by **month x census_district** with the recipe [compute_square_meter_price_monthly_time_series](recipe:compute_square_meter_price_monthly_time_series). Looking at the picture below, the time series is now smoother:
  ![square-meter-price-monthly-per-district.png](bwEkUHOp8IKN)
 
 
- Before training our model, we compute a *month_number* for each **month x census_district** in recipe [compute_square_meter_price_monthly_time_series_windows](recipe:compute_square_meter_price_monthly_time_series_windows). This information just helps in splitting the data into train and test in the machine learning step.