The [sales_per_category_dashboard_preparation](flow_zone:QpVLY0r) Flow Zone creates datasets on which charts are based to include them in dashboards.

The dashboard is divided into three parts: sub-categories focused, product focused, and store sales performance focused.

## Sub-Categories Datasets
![subcategories datasets.png](KhE97OvasCaM)
This part of the Flow Zone creates one dataset on which the [Radar chart of revenue per item per cluster and sub category](insight:W4gC0mF) is added to the dashboard:
![radar chart.png](X5XJPqvzn6N4)

And also one dataset that will be displayed in the [Sales per Category Segmentation](dashboard:JiM4DuH) dashboard:
![sub category performance ranking.png](S4JumHg0F19P)

### Radar Chart: sub_categories_joined_by_cluster_labels dataset
The radar chart represents the sales per item, or revenue per item, for each subcategory from the smallest level. The sales per item are calculated by dividing the subcategory revenue by the number of items. This metric allows the user to compare the subcategories' performance with one another.

This radar chart is created on the [sub_categories_joined_by_cluster_labels](dataset:sub_categories_joined_by_cluster_labels) dataset.

To achieve this, we need to retrieve information about the cluster labels, and for each product, the number of times and the store where it has been bought, the quantity of sold items, as well as the product purchase price. Additionally, we need the target category and the subcategories of the product.

The gathering of all this information is done using the join recipe [compute_product_revenue_with_cluster_and_categ](recipe:compute_product_revenue_with_cluster_and_categ).

Then, for each subcategory level, the data is grouped by `cluster_labels`, `target_category`, and the `sub_category_X` column, and we sum the number of transactions, the product quantity, and revenue.

These variables allow us to calculate in a prepare recipe (such as [compute_sub_category_1_prepared](recipe:compute_sub_category_1_prepared)), the subcategory sales per transaction and sales per item.

After that, all the data is gathered in one dataset, [sub_categories_joined](dataset:sub_categories_joined), through the [compute_sub_categories_joined](recipe:compute_sub_categories_joined) recipe.

Finally, the recipe [compute_sub_categories_joined_by_cluster_labels](recipe:compute_sub_categories_joined_by_cluster_labels) pivots on the `cluster_labels` column and for each subcategory of the smallest level, we get the subcategory revenue per item value. This gives us a dataset with one column per cluster, each column containing the revenue per item of the subcategory. Here is what the dataset can look like:
![sub categories by cluster label.png](f5qUOvbMVjAz)

### Subcategory Performance Ranking: sub_categories_joined_topn dataset
The dataset [sub_categories_joined_topn](dataset:sub_categories_joined_topn) is used to analyze the performance of subcategories at a finer level. For each subcategory level, this dataset provides information about the revenue, the number of transactions, the number of sold items, the sales per transaction, and the sales per item. The sales per transaction is calculated by dividing the revenue by the number of transactions, and the sales per item is calculated by dividing the revenue by the number of items. 
Here is the schema of this dataset:
- cluster_labels
- target_category
- sub_category_1
- sub_category_1_revenue
- sub_category_1_number_of_transactions
- sub_category_1_number_of_sold_items
- sub_category_1_sales_per_transaction
- sub_category_1_sales_per_item
- sub_category_2
- sub_category_2_revenue
- sub_category_2_number_of_transactions
- sub_category_2_number_of_sold_items
- sub_category_2_sales_per_transaction
- sub_category_2_sales_per_item

This dataset is built from the [sub_categories_joined](dataset:sub_categories_joined) dataset, thanks to a sort recipe. This recipe sorts the rows in decreasing subcategory revenue order for each cluster label and target category. This outputs the displayed dataset [sub_categories_joined_topn](dataset:sub_categories_joined_topn).

## Product-Level Datasets
The second part of the Flow Zone focuses on creating datasets for product-level charts.
![products level datasets.png](AOtvj6v3wbJy)

It aims to create these two datasets: [product_revenue_and_share_top10](dataset:product_revenue_and_share_top10) and [product_revenue_and_share_bottom10](dataset:product_revenue_and_share_bottom10). The objective is to get, for each cluster and target category(ies), the top/bottom five selling products. This is done through calculating the total revenue for each product. In addition, we calculate the product revenue share within its category to give the user an idea of how much the total revenue represents in the category sales.

Here is what it looks like on the dashboard:
![product level dashboard.png](jNk4rEIpd4II)

Let's see how the [product_revenue_and_share_top10](dataset:product_revenue_and_share_top10) dataset is created. It also starts from the [product_revenue_with_cluster_and_categ](dataset:product_revenue_with_cluster_and_categ) dataset, where the rows are grouped by `cluster_labels`, `target_category`, and `product_id`. At the same time, we sum the `product_revenue` column to get the total revenue of a product for each cluster and each category.

Then, the Python recipe [compute_product_category_revenue](dataset:compute_product_category_revenue) calculates the percentage of sales for each product within its associated category.

Once that is done, there is one sort recipe for each top and bottom product performance ranking: [compute_product_revenue_and_share_top10](recipe:compute_product_revenue_and_share_top10) and [compute_product_revenue_and_share_bottom10](recipe:compute_product_revenue_and_share_bottom10).

## Store Sales Performances
The third part creates two useful datasets: [stores_clustered_with_sales_evolution](dataset:stores_clustered_with_sales_evolution) and [stores_clustered_with_sales_per_category_data_sorted](dataset:stores_clustered_with_sales_per_category_data_sorted).
![cluster stores charts.png](mRbMJRoqQ4z2)

### stores_clustered_with_sales_evolution dataset
The [stores_clustered_with_sales_evolution](dataset:stores_clustered_with_sales_evolution) dataset allows us to create the [Total Revenue by Cluster pie chart](insight:NfsCR2V), the [Revenue Trends by Cluster chart](insight:OEclxy8), and the [Revenue Trends by Category chart](insight:vi2GYFc).

The most important variable from this dataset is the `time_granularity`. This allows us to create trends charts like this one:
![revenue trends by cluster.png](iJOOqnBIV7Pr)

You can see that on the above graph, there is a pink line that represents the total revenue of all stores. To be able to show this metric on the chart, we use the [transactions_with_all_data](dataset:transactions_with_all_data), which is the output of the [transactions_preprocessing](flow_zone:dMGG99Q) Flow Zone. We group its rows by time granularity and calculate the total revenue at each time granularity, regardless of the store.

Then, the [compute_transactions_with_all_data_by_time_granularity_prepared](recipe:compute_transactions_with_all_data_by_time_granularity_prepared) just renames the `product_revenue_sum` column to `total_revenue` for more clarity.

After that, we can join the output dataset of the prepare recipe, [compute_transactions_with_all_data_by_time_granularity_prepared](dataset:compute_transactions_with_all_data_by_time_granularity_prepared), with the output dataset of the [sales_per_category_clustering](flow_zone:hjoEdjT) Flow Zone. With this operation, we can get for each time granularity, the store where the transaction happened, the product revenue, the cluster label, as well as all categories and subcategories, and the `total_revenue` (which is the cluster total revenue, same for all products within all stores from the same cluster). Thanks to this final [stores_clustered_with_sales_evolution](dataset:stores_clustered_with_sales_evolution) dataset, we can create the above charts.

The [stores_clustered_with_sales_per_category_data_sorted](dataset:stores_clustered_with_sales_per_category_data_sorted) dataset is just the sorted version of the output dataset from the [sales_per_category_clustering](flow_zone:hjoEdjT) Flow Zone. It needs to be sorted to be able to display the map's legend in alphabetical order (cluster_0, cluster_1, ...).
