# Data Sources requirements

In this solution, only one dataset is mandatory, with the option to add a second dataset for pricing information and a third for legacy forecast. The data must be structured as **time series** and follow specific formatting conventions to ensure proper forecasting performance and validation.


## Sales Dataset (mandatory)

This solution requires a primary dataset containing the historical quantities sold or consumed, referred to as the **sales dataset**.

**[Sales Dataset schema](dataset:input_demand_data)**

- `date` (_date_): Time variable, must be equispaced (daily, weekly, or monthly).  
- `sales` (_integer_): Quantity sold or consumed at each time step.
- `product_id` (_string_): (Optional) Product or product group identifier — used as Time Series ID 1.
- `item_id` (_string_): (Optional) Additional segmentation — used as Time Series ID 2 (e.g., store, region, channel).

Notes:
- Each row must correspond to a past observation of sales at a specific time and scope.
- Time series must be complete (no gaps) and long enough for model training.
- Missing values in the `sales` column should not contain gaps.
- **Column names in your dataset do not need to match these exactly** — they can be freely mapped during the Project Setup step. Original names are preserved in the flow.

Example schema:

| date       | product_id | item_id | sales |
|------------|------------|---------|-------|
| 2025-03-01 | FOODS_1    | CA_1    | 42    |

---

## Pricing Dataset (optional)

This dataset includes historical information. It must align in time and segmentation with the sales dataset.
The columns must be named base_price and sell_price to ensure compatibility with the forecasting solution.


**Pricing Dataset schema**

- `date` (_date_): Must match the `date` column in the sales dataset (same name, format and time step).
- `sell_price` (_float_): Price paid by the customer.
- `base_price` (_float_): Full (non-discounted) price.
- `product_id`, `item_id`: Must match the TSIDs in the sales dataset.

Notes:
- Must cover both the historical period and the forecast horizon.
- Useful to capture promotion effects or price sensitivity in forecasts.

Example schema:
| date       | product_id | item_id | price | base_price |
|------------|------------|---------|-------|------------|
| 2025-01-01 | FOODS_1    | ITEM_1  | 5.99  | 7.99       |


---
## Legacy forecast Dataset (optional)

This dataset includes legacy forecasting information. It must align in time and segmentation with the sales dataset. 

**Legacy_forecast Dataset schema**

- `date` (_date_): Must match the `date` column in the sales dataset (same name, format and time step).
- `legacy_forecast` (_float_): Price paid by the customer.
- `product_id`, `item_id`: Must match the TSIDs in the sales dataset.

Example schema:

| date       | product_id | item_id | legacy_forecast |
|------------|------------|---------|------------------|
| 2025-01-01 | FOODS_1    | ITEM_1  | 48.0             |
