This walkthrough takes you from demand data to validated forecasts, using a real-world dataset.

# Prepare Demand Data

For this example, we use publicly available sales data from Walmart, via the [Kaggle "M5 Forecasting Accuracy" Challenge](https://www.kaggle.com/c/m5-forecasting-accuracy/data).

![Sales Data Monthly.png](N3uXwOfH993t)

- **Input Data**: Transaction-level data was aggregated to a fixed time step (here, monthly), aligned with the expected [Data model](article:16).
- **TSIDs (Time Series Identifiers)**: We use `store_id` (9 stores) and `item_id` (30 items), resulting in 40 distinct time series.
- **Timeframe**: The `year_month` column runs from 2019-09-01 to 2024-12-01, covering 4 years and 4 months (52 months).
- **Volume**: 52 months × 40 time series = 2,080 rows.
- **Granularity Note**: The value for 2024-12-01 represents the sum of sales from 2024-12-01 00:00 (inclusive) to 2025-01-01 00:00 (exclusive).
- **Prep in Dataiku**: Load the dataset into a prep project and parse the `year_month` column as a date. When using Pricing and Legacy Forecasts datasets, make sure to also parse the date column and to rename columns to align with the expected [Data model](article:16).

# Configure and Run the Solution

Installing the **Demand Forecast** Solution creates a project with a visual [Project setup](article:11) interface:

- **Input**: Select the prepped sales dataset.
- **Mapping**:
  - Sales = `sales`
  - Time Series IDs = `store_id`, `item_id`
  - Timestamp (`year_month`) and time step (monthly) are auto-detected.
- **Forecast Horizon**: Set to 6 (months).
- Click **Run Now** to launch the forecasting workflow.

![Project Setup.png](XYqAQ8sWCFqK)

What happens behind the scenes:

- Multiple forecasting algorithms are trained and compared, including classical statistical methods and modern ones (e.g., Prophet, DeepAR).
- A backtest is performed, simulating the forecast accuracy on unseen data.
  - By default, the backtest period is set to cover 1 horizon.
  - Training set = September 2019 to May 2024 = 46 months × 40 series = 1,840 rows.
  - Backtest set = June to December 2024 = 6 months × 40 series = 240 rows.

Forecasts are generated for both the backtest period and the future horizon period (January to June 2025).

# Validate Forecasts

## Technical Validation

Forecasts are generated by a Scoring recipe that uses a trained Forecasting Model. To review:

1. Open the Flow and the **Scoring recipe output dataset** for forecasted values.
   - These are the "raw" decimal numbers generated by the Model. They may need to be converted to integer quantities, depending on your usage.
   - The dataset has the same TSIDs as the sales dataset, timestamp values corresponding to the future horizon period, pricing columns if a Pricing dataset was provided during Project Setup, and a `forecast` column.
2. Click the **Forecasting Model** in the Flow to access version history (initially, one version).
3. Click into the model version to review Performance and Model Information.

To further validate or to refine the Model, click on **View Original Analysis**. In the breadcrumbs at the top-left of the page, click on **Models** to access the modeling experiments that were run behind the scenes.

![Forecasting Results.png](LYFm9nw5kpVo)

In this interface:

- The **Result** tab shows performance metrics (e.g., Symmetric Mean Absolute Percentage Error) for various algorithms.
- Here, DeepAR = 21.4%, Seasonal Trend = 31%.
- This gives a sense of how far off the forecasted demand is — in this case, for a given item at a given store — on average.
- Note that the backtest period used to compute these results is different from the "backtest" used in the Dashboard Explorer. The former is an "internal" backtest, whereas the latter is particularly useful to compare forecasts with a fixed legacy approach implemented externally, via a Legacy Forecast dataset to be provided during Project Setup.
- The **Design** tab allows you to review algorithm choice and settings. You can rerun the analysis to refine models as needed.

## Business Validation

- Share the **[Demand Explorer Dashboard](dashboard:jrsMSKr)** with stakeholders.
- It includes a link to a Help page which embeds the corresponding [Wiki page](article:12) and explains how to use the Dashboard to validate forecasts.
- You can tailor the Dashboard and the Wiki page to the needs of the business and add contextual information (e.g., confirming forecast time step, caveats, etc.).

# Adapt for Ongoing Rolling Forecasts

To move into production with updates at every time step (here, monthly):

- **Keep input data current**: Ensure the Sales dataset in the prep project is updated with new observations as they become available. When using a Pricing dataset, ensure updates with intended pricing information for the new horizon of future time steps.
- **Update the training window**:
  - Option 1: Adjust the cutoff date in the **Filter recipe** to include more recent time steps.
  - Option 2: Remove the Filter and retrain with all available history.
- **Forecast with new data**:
  - To **keep a history** of forecasts: Sync the Scoring output to a cumulative dataset and run in **append mode**. It contains `smmd_savedModelId` and `smmd_predictionTime` columns to disambiguate forecasts made for the same time step.
  - Automate updates by enabling the [Rescoring Scenario](article:13) trigger when the input dataset refreshes.
- **Monitor Forecast Performance**:
  - Set up **Scenario Reporters** to trigger alerts if accuracy metrics (e.g., MAPE) exceed defined thresholds.
  - Compute these metrics on the latest time step or on longer moving windows (e.g., last 3–6 months).
  - Extend with business KPIs. For an inventory optimization use case, for example, we would want to ensure that forecast improvements lead to inventory reductions while avoiding understocking:
    - SLA adherence, such as high product availability for top-tier SKUs, next-day shelf replenishment for high-turnover items.
    - Store-level availability metrics, such as on-shelf availability and percentage of forecasted demand fulfilled by actual sales before stockouts occur.
