This sample project is based on data from a [Kaggle challenge](https://www.kaggle.com/c/rossmann-store-sales).

Many retail businesses need accurate forecasting of the revenue produced by each of their stores. These forecasts allow for planning, staffing optimization, as well as sure that each store has the necessary supply. Without these forecasts, businesses may waste money by overstocking a store, or worse yet, lose out on revenue because a store does not have enough supplies to handle predicted revenue.

In this project, we use historical data from the Rossman pharmacy chain to build a predictive model to forecast the revenue of each of their stores. This model can be run weekly or monthly and provide business actors with accurate predictions about the revenue for coming days or weeks. This information can then be used to optimize business practices and streamline operations.


<br/>
# Business Goal

We want to build a project to answer the following questions:

- What is the expected revenue for each store on each day?
- What factors influence the revenue of a store most?

<br />

# How do we do this?

We start with 2 different data sources:
- two datasets with the revenue per store per day, split between our [historical data](/projects/DKU_STORESALES/datasets/historical_data/explore/) (used to train the model), and our [forecasting data](/projects/DKU_STORESALES/datasets/forecasting_data/explore/) (used to deploy our model)
- a dataset with [information about each store](/projects/DKU_STORESALES/datasets/store_descriptions/explore/).

Like many data projects, we then proceed with three steps:
1. **Data Cleaning:** we clean our data and build our features
2. **Predictive Modeling:** we build and deploy a predictive model
3. **Visualization:** we create a useful visualization of our predicted data

Let's go through each one of those steps in more detail to see what we did.

## Explore this sample project

Start by looking at the flow and visualising the different steps of the project. You can see the preparation steps in yellow and the predictive modelling steps in green.

<p class="text-center">
<a href="/projects/DKU_STORESALES/flow/"  class="btn btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project" class="btn-cta-big-mod-icon" />&nbsp;Flow</a><br/><br/>
</p>

###Data cleaning

We used a preparation script to parse dates and engineer features from them. This is a data type common to many datasets, when the relevant data from a column has to be extracted to be useful.

<p class="text-center">
<a href="/projects/DKU_STORESALES/recipes/compute_historical_data_cleaned/"  class="btn btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project" class="btn-cta-big-mod-icon" />&nbsp;Visual Preparation</a><br/><br/>
</p>

We then used a join recipe to *enrich our data* with meta-data about each store. This gives us more features that will be fundamental for the next step: predictive modelling.

<p class="text-center">
<a href="/projects/DKU_STORESALES/recipes/join_store_descriptions/"  class="btn btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project" class="btn-cta-big-mod-icon" />&nbsp;Join recipe</a><br/><br/>
</p>

###The Predictive Model


We built a model to predict the revenue for each store with an accuracy as high as possible. This project can be used in production to regularly produce forecasts for the coming week or month for a business. The business can then use these number to optimize staffing or stocks at each store. 

<p class="text-center">
<a href="/projects/DKU_STORESALES/savedmodels/FwuGVsh1/versions/"  class="btn btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project" class="btn-cta-big-mod-icon" />&nbsp;Model</a><br/><br/>
</p> 

We can check the variables importance to see which factors are more important in predicting each store's revenue.

<p class="text-center">
<a href="/projects/DKU_STORESALES/savedmodels/FwuGVsh1/p/S-DKU_STORESALES-FwuGVsh1-1489064628152/#variables_importance"  class="btn btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project" class="btn-cta-big-mod-icon" />&nbsp;Feature Importance</a><br/><br/>
</p> 

After looking at this we can see that the most important predictors for revenue are:
- The day of the week
- Whether there's a sale or not
- How far the store is from a competitor's store


### The Dashboard

To communicate on our model's results, we built a dashboard with visualizations of the predictive model. Rather than looking at an excel-style table, these visualizations allow a team to easily get a quick feel for the data and the revenue forecasts.

<p class="text-center">
<a href="/projects/DKU_STORESALES/dashboards/qv5eDn7_forecast-results/view/jIpOZHJ"  class="btn btn-datasets-color btn-cta-big-mod"><i class="icon-dku-sample_project" class="btn-cta-big-mod-icon" />&nbsp;Dashboard</a><br/><br/>
</p>

#Related content

- A [quick start recap article of all our Learn articles related to machine learning](http://www.dataiku.com/learn/guide/quickstart/machine-learning.html)
- A free training [video on how to deploy a model to predict a movie's budget](http://www.dataiku.com/learn/guide/free-training/deploy-predictive-application.html) in just 15 minutes