Data preparation steps in the solution include  **data normalization**  and  **lag values creation** . The following paragraphs detail the techniques used in both cases.

## Data normalization
To deal with the potential differences in the magnitude of values within each category, the target variable is scaled using a min-max normalization technique.

Min-max normalization (usually called feature scaling) performs a linear transformation on the original time series data to scale numeric features to a given range, such as 0 to 1 or -1 to 1. The method works by subtracting the minimum value in the feature from each data point and then dividing the result by the range of the feature (i.e., the difference between the maximum and minimum values). 

The formula to achieve this is the following:
![Screenshot 2023-01-03 at 15.24.58.png](cNAOCYpSnutQ)

 **Example of data before min-max normalization**  : 
![Screenshot 2023-01-03 at 15.27.26.png](wsGMOZk78f70)
We can see a strong difference in magnitude between business unit 4 and the other business units. 
**Example of data after min-max normalization** : 
![Screenshot 2023-01-03 at 15.38.17.png](G622DHoKEJo0)
This transformation scaled the values of each business units' time series so that they fall within the desired range.
Note: To compute the minimum and maximum values, we only looked at the train part of the dataset, which excludes the last 12 data points. This explains why some values are outside the expected 0 to 1 range. 

## Lag values creation 
In a time series, a lag value refers to a past point in time. For example, if you have a time series with daily data, a lag value of 1 would refer to the previous day's data, a lag value of 2 would refer to the data from two days ago, and so on.

Lag values in this solution refer to the number of previous time periods that are used in the [advanced forecast](article:13) method to predict the value of the next horizons. In the **Dataiku application**, the user has the possibility to choose the number of target variable’s lag values to include in the model. By default, one lag value will be computed and included in the advanced forecast. 



