The aim of the [transactions_preprocessing](flow_zone:dMGG99Q) Flow Zone is to prepare the [transactions](dataset:transactions) dataset in a format where the transactions are gathered with a regular time granularity. We consider as a **transaction** all the products that a customer bought at a certain time in a certain store.

![transactions-preprocessing.png](z9aigQZIGZFn)

First, the [compute_transactions_on_batch_date](recipe:compute_transactions_on_batch_date) recipe filters the transactional data to get only the transactions within the defined time period. It does so using the global variables ```batch_start_date_app``` and ```batch_end_date_app```.

At the same time, a Python recipe is preparing the time granularity corresponding to each possible date in this time period. The time granularity is chosen by the user in the Project Setup.

Then, the two output datasets from these recipes are joined on their transaction dates to have the time granularity for each transaction ([compute_transactions_with_dates_granularities](recipe:compute_transactions_with_dates_granularities)).

Following this operation, the [compute_sales_aggregated](recipe:compute_sales_aggregated) recipe groups the transactions by ```store_id```, ```time_granularity```, ```product_id```, and ```product_purchase_price```. At the same time, we count the number of transactions for this product in this store at this time, and we also sum the total quantity of sales for this product. Thanks to this, each row of the output dataset will contain information about one product sold at a certain time, in a certain store, at a certain price, in a certain quantity, and for a certain number of transactions.

Then, the prepare recipe [compute_sales_aggregated_prepared](recipe:compute_sales_aggregated_prepared) rounds the ```product_purchase_price``` to two decimal places, creates a ```product_revenue``` column (calculated by multiplying the ```product_purchase_price``` by the ```product_quantity_sum```), and finally rounds the ```product_revenue``` column to two decimal places.
![compute_sales_aggregated.png](1Z6Zfs75HBRC)

Finally, we have a dataset [sales_aggregated_prepared](dataset:sales_aggregated_prepared) that contains all of this information:
- store_id
- time_granularity
- product_id
- product_purchase_price
- transaction_id_count
- product_quantity_sum
- product_revenue

The last recipe of this flow zone is the join recipe [compute_transactions_with_all_data](recipe:compute_transactions_with_all_data), which joins this dataset with the [products](dataset:products) dataset to add the product categories and subcategories information (stored in the columns ```target_category``` and ```sub_category_X```) and the [stores_with_geopoint](dataset:stores_with_geopoint) dataset to get the location of each store with the ```geo_point``` column.