The goal of this project is to forecast weekly retail sales for the next 6 months across multiple stores and departments from this Kaggle [dataset](https://www.kaggle.com/manjeetsingh/retaildataset?select=sales+data-set.csv). We use the Time Series Forecast [plugin](https://www.dataiku.com/product/plugins/timeseries-forecast/) to model sales.

# Data preparation

The [input dataset](dataset:sales) is a CSV downloaded from Kaggle with Store, Department, Date and Weekly Sales columns.

The first [ Prepare recipe](recipe:compute_sales_prepared) parses the dates. 

The [window recipe](recipe:compute_sales_prepared_windows) is used to count the number of dates for each Store - Dept couple. We remove Store - Dept time series that have too few dates.

The time series preparation [resampling recipe](recipe:compute_sales_resample) transforms time series data occurring in irregular time intervals into equispaced data that are required for the time series forecast training recipe.

# Train and evaluate forecast models

The [Train and evaluate forecasting models recipe](recipe:compute_mC2n9iDZ) train multiple forecasting models using all time series of each store and department. In the recipe settings, we specify the time column (Date), the two time series identifiers columns (Store and Dept), the frequency of the timeseries (Weeks starting on Sundays) as well as the forecastin horizon (here we chose to forecast values for the next 26 weeks).

We also selected the AutoML - High performance mode that will train intensively multiple Deep Learning models as well as two baselines models.

The trained models are stored in the [Trained model folder](managed_folder:mC2n9iDZ) and their scores are available in the [Performance metrics dataset](dataset:performance_metrics). Scores were obtained by removing the last 26 weeks values during training and comparing the predicted forecasts of this time period to the true values. These true and predicted values can be found in the [Evaluation dataset](dataset:evaluation_forecasts) and visualized in the first slide of the [Forecasting dashboard](dashboard:pdAK1NI).

# Forecast future values

With the [Forecast future values recipe](recipe:compute_future_forecasts), we use the trained models to forecast the weekly sales for each store and departments over the next 26 weeks. We chose to select the best previously trained model according to a specific performance metric. 

This recipe outputs a [Forecast dataset](dataset:future_forecasts) that contains median as well as lower and upper bounds forecast for the next 26 weeks based on the selected confidence interval. These forecast can be visualized in the second slide of the [Forecasting dashboard](dashboard:pdAK1NI).

# Decompose time series data

The [decomposition recipe](recipe:compute_sales_decomposed) decomposes the past sales into trends, seasonal components and residuals. The resulting dataset helps us detect anomalies and compare the trends of the different stores, as the third slide of the [Forecasting dashboard](dashboard:pdAK1NI) shows. 

