Data preparation and feature engineering steps should be done in a separate project to match the expected input datasets format. Respecting the expected data model when loading new data helps to ensure that the results are reliable and easy to interpret.

## Input datasets format
The input data should be separated into two distinct groups of datasets, **historical** and **to forecast** datasets with the same time-frequency: 
**3 Historical datasets**: historical time series data about the financial variable to forecast, manual forecasts and drivers.

- (1) historical_actual_value_dataset: 

 ![Screenshot 2023-03-27 at 12.28.19.png](b2EmGJNPzTo5)
 
 **NB.** The current version of the solution does not yet allow for negative values or values equal to 0 to be included in the actual_value column.
 
- (2) historical_forecasts_dataset
 
 ![Screenshot 2023-03-27 at 12.29.48.png](qxZ2a5u8Isi1)
 
- (3) historical_drivers_dataset
 
 ![Screenshot 2023-03-27 at 12.29.58.png](PFQFUWLpgeUQ)
 
**2 To forecast datasets**: time series data about the period we want to forecast, including drivers’ expected values and manual forecasts.

- (1) to_forecast_forecasts_dataset: 

 ![Screenshot 2023-03-27 at 12.30.08.png](mz79dpKoOCBB)

- (2) to_forecast_drivers_dataset: 

 ![Screenshot 2023-03-27 at 12.30.22.png](Byt8yjngdVOk)

## Chronological order and time frequency
The data points to be forecasted should be more recent than those in the historical dataset. Respecting the chronological order allows the data to be correctly interpreted and helps to ensure that accurate forecasts can be made.

The time intervals within each dataset must be consistent. For instance, if you choose a 3-day frequency, every timestamp should be exactly 3 days apart.

## Manual forecast
Manual forecasts must include the  **user forecasted values**  covering the period of the historical data and the period we want to forecast. Manual forecasts are usually built by the finance team without the help of machine learning.

## Target variable
The financial target variable can be  **split into categories**  and different levels of granularity such as regions, streamlines, or product types. The user must keep a  **long format dataset**  by using the column category to allocate financial values and driver values as needed.

## Horizons to forecast
The user can choose the number of horizons he wants to forecast by building the dataset to be forecasted accordingly. If the objective is to forecast the next 12 horizons, the dataset to be forecasted (**to forecast data**) must contain  **12 rows** x **number of categories**. Each row corresponds to a date horizon to forecast. 

 **Example** : 
 If we have two business units in our company and want to forecast the actual_value for the next four days, we would need a dataset to be forecasted containing 8 rows (4 horizons x 2 categories) of data: 

![Screenshot 2023-01-12 at 11.19.16.png](qBOonK0BqAE8)

## Drivers
Drivers are features that could have a significant impact on the target variable being forecasted. Drivers included in the data can be company-specific or macroeconomic information chosen by the user in a  **numerical format**.
- Examples of  **macroeconomic drivers**: economic growth, inflation, interest rates
- Examples of  **company-specific drivers**: product or service innovation, marketing and sales efforts, customer satisfaction, operational efficiency

Drivers can be specific to each time series (ex: number of employees per business unit at a specific date) or apply to all categories (ex: number of employees in the whole company at a specific date). The user must adjust the data to keep the expected long-format dataset detailed above.