This article goes through all the steps, from data input to obtaining the final forecasts. It starts with an already prepared dataset that includes 300 rows of historical monthly revenue data, split into five business units and containing four required columns as well as three driver columns. The second dataset comprises 60 rows covering the next 12 months. It includes the same columns as the historical dataset except for the actual value column, which will be forecasted. 

The [Data Model](article:8) article describes the necessary format criteria to be met.

To configure and execute the process, we will follow the steps in the [Dataiku Application](article:18). 

The case study data is already in filesystem format, so we can skip the upload or connection steps and move straight to forecasting settings. If you're using your data, you'll need to upload or connect it before proceeding. 

## Advanced forecast settings 

This section allows users to set the number of lag values and driver columns for advanced forecasting and analysis. To ensure the drivers' list reflects the newly uploaded data, the user must refresh the page before configuring the parameters.

![Screenshot 2023-01-20 at 18.28.46.png](OTlpdlY4WmDE)

In this use case, we first decided to include two drivers (driver_A and driver_B) in the project and see their relationship with the actual value. The lag values will be used as supplementary features in the advanced forecast. The drivers will also be used as features and included in the Drivers page of the dashboard. Since we are dealing with monthly data, we decided to include the last 12 data points (lag values) to go up to the preceding year's value. 

After setting the parameters, we can click the "run" button to initiate the project-building process. Once the scenario has completed running successfully, we can access the Data exploration page by following the link to the [Dashboard](dashboard:j4Pjdob).

## Data Exploration

The top of the page shows metrics about the actual value over the period covered by the historical dataset. In our case, we can see that the actual_value has increased by 110.61% over the 4.9 years period covered by the demo dataset and that the value is divided into five categories. The metric on the right shows the horizon of the period we want to forecast. 

If the dashboard metrics are not displaying properly and showing error messages, the user must click the "compute metrics" button on the right.

![Screenshot 2023-01-20 at 10.53.06.png](4VqluKyg2vNP)

The filter section on the left allows you to select one or many categories and update the charts of the page accordingly. 

![Screenshot 2023-01-20 at 10.52.50.png](aPZh7Y83TC4R)

The slide features multiple graphs, each accompanied by an explanation box on the right. For example, the third visual, titled "Total Actual Value per Category," is a stacked bar chart that shows the distribution of the total actual value among different categories throughout the period.

![Screenshot 2023-01-19 at 10.55.26.png](P7RTuTSM5DLc)

The dates on the x-axis are divided into quarters and cover the entire range of the input historical dataset. The y-axis represents the total actual value. The data shows that BU 1 is relatively stable, while BU 4 experiences a significant increase. An upward trend is also observed in BU 3 and BU 5, accompanied by strong seasonal patterns. Conversely, BU 2 only displays seasonal patterns without any trend.

## Forecast comparison

The forecast comparison page allows the user to evaluate forecast performance by viewing forecasted values and analyzing them both overall and by category, as well as over different time horizons. Once again, the filter section on the left allows you to select one or many categories and update the charts of the page accordingly. 

![Screenshot 2023-01-20 at 18.20.57.png](7adELrtmaLcj)

The top of the Forecast comparison page presents the Mean Absolute Percentage Error (MAPE) metric for each forecasting method. In this scenario, the advanced forecast is the best performer, with the lowest MAPE of 14.18%. The Manual forecast follows in second place with a MAPE of 28.34%, and the Simple forecast comes in a close third at 30.12%. These results suggest that the Advanced forecasting method is the most accurate, and its predictions should be given significant weight in decision-making.

![Screenshot 2023-01-20 at 18.21.38.png](9ZCXTs1iszkP)

The second line graph displays the predicted Actual Value for future horizons using Advanced, Simple, and Manual Forecasts. The orange line represents the Actual Value for the historical period up to the forecasting point. The blue lines represent the predicted values for each forecasting method for the upcoming horizons. The values predicted by the manual forecast tend to be higher than those predicted by the simple and advanced methods. The simple forecast appears to be more smooth, while the advanced forecast is more precise.

![Screenshot 2023-01-19 at 11.28.08.png](Eg461La97DQ9)

The bar chart titled "MAPE per forecasting method and subcategory" indicates the value of the MAPE per forecasting method for each category. A lower bar indicates a smaller MAPE and, as a result, a better capacity of the forecast to predict the actual value of a specific category. 

![Screenshot 2023-01-20 at 18.25.48.png](wDX4Iv31hP2x)

This chart highlights that the Simple forecast is inconsistent across different business units, performing well in some and poorly in others. In contrast, the manual forecast is relatively stable across all categories. Upon examining BU 1, it can be observed that the Advanced forecast has the best performance with a MAPE of 6.23%. However, in the case of BU 4, the Simple forecast performs better with a MAPE of 7.63%. 

## Drivers analysis 

The "Drivers" slide of the dashboard offers an overview of the drivers' values and illustrates their correlation with the target value. In this case, "driver_B" has a correlation coefficient of 0.59 with the actual value, while "driver_A" has a very weak relationship with the target value.

![Screenshot 2023-01-20 at 18.21.14.png](JkvmupOFY1ez)

Based on these findings, it would be worth considering removing "driver_A" from the project and evaluating the performance of the Advanced Forecast. There is a possibility that this could improve the forecast's accuracy.

## Simple forecast analysis

The Simple Forecast page highlights the results and methodology of the [Simple Forecast](article:37). At the top of the page, the predicted values for a are compared to the actual values. The following two graphs illustrate the Simple Forecast's predicted values for each business unit.

![Screenshot 2023-01-20 at 16.35.40.png](s6LVdj410bAz)

The bottom section of the slide delves into the time series models employed in constructing the Simple Forecast, as outlined in the [Simple Forecast](article:37) article. In this case, we can see that five ARIMA models were developed, one for each business unit. 

![Screenshot 2023-01-20 at 16.36.17.png](qfczRCKacmQ4)

These models were constructed by selecting the best set of parameters, which are displayed in the table located on the right side of the slide.

## Advanced forecast analysis

The Advanced Forecast page presents the results and methodology of the [Advanced Forecast](article:38). At the top of the page, the predicted values for a are compared to the actual values. The following two graphs illustrate the Advanced Forecast's predicted values for each business unit.

![Screenshot 2023-01-20 at 18.21.55.png](dq4nL8QSoVpF)

The bottom section of the slide presents detailed information about the model used to generate the forecast, including a variable importance plot, partial dependence plot, what-if analysis, and subpopulation analysis.

![Screenshot 2023-01-20 at 18.38.47.png](7cDW9mMrsJ1j)

For example, the variable importance plot demonstrates that the advanced model heavily relies on the "driver_B" value. Still, other features also have significance, such as seasonality, category, time series forecast, and the fifth lag value.