# 📊Sales history dataset
This document describes the synthetic e commerce dataset created for forecasting daily sales in France.  
It includes both the structure of the dataset and the rules and assumptions used to generate realistic retail behavior such as seasonality, promotions, holidays, delivery cutoffs, and website traffic patterns.

---

## 📂 Dataset structure

Each row in the dataset represents daily sales for a given product. The features are:

- `date` 📅 → Calendar day. 
- `product` 🏷️ → Product category (`tshirt`, `toy`, `laptop`). 

**Target variable**
- `sales` 🛒 → Units sold.

**External features - Past covariate (only known historically)**
- `website_traffic` 🌐 → Daily visits for the product category. 

**External features - Known in advance**
- `is_holiday_flag` 🎉 → 1 if the day is a French holiday, else 0. 
- `promotion_level` 💸 → `none`, `low`, `medium`, `high`. 
---

## 📈 Long term trends
- 👕 T shirts: stable baseline and highest starting sales.  
- 🧸 Toys: mid baseline and slight steady growth plus 2 percent per year.  
- 💻 Laptops: lowest baseline but strongest long term growth plus 6 percent per year.  

---

## 📅 Weekly patterns
- 📆 Weekdays Monday to Friday: sales a bit lower about minus 10 percent.  
- 📆 Saturday: small boost plus 10 percent sales and plus 8 percent traffic.  
- 📆 Sunday: big boost plus 25 percent sales and plus 20 percent traffic since shops are closed.  

---

## 🎉 Public holidays
- 🇫🇷 New Year Jan 1, Bastille Day Jul 14, Christmas Day Dec 25.  
- 🏪 Physical stores closed so e commerce benefits  
  - Traffic plus 10 percent  
  - Sales plus 7 percent  

---

## 🌸☀️ Seasonality
- 👕 T shirts  
  - Higher sales in spring and summer March to August.  
  - Peak in June and July about plus 40 percent.  
  - Weak demand in winter.  

- 💻 Laptops  
  - Slight dip in July and August.  
  - Otherwise steady.  

- 🧸 Toys  
  - Sales surge in November and December.  
  - Strongest in early December.  

---

## 🛍️ Promotions
- Winter Soldes January to February  
  - 👕 T shirts ramp from medium to high.  
  - 💻 Laptops constant low.  
  - 🧸 Toys none.  

- Summer Soldes late June to early August  
  - 👕 T shirts ramp from medium to high.  
  - 💻 Laptops constant low.  
  - 🧸 Toys none.  

- Black Friday Friday to Cyber Monday 4 days  
  - 💻 Laptops high.  
  - 🧸 Toys medium.  
  - 👕 T shirts medium.  

- Christmas promos December 1 to 23  
  - 🧸 Toys medium.  
  - 💻 Laptops low.  
  - 👕 T shirts none.  

---

## 🎁 Christmas delivery cutoff
- From December 24 onward sales drop only for products that benefit from Christmas  
  - 🧸 Toys strong drop mirrors strong December boost  
  - 💻 Laptops mild drop  
  - 👕 T shirts no drop not a Christmas gift  

---

## 🌐 Website traffic and conversion
- Traffic follows weekend and seasonal patterns similar to sales.  
- Customers browse more before big events such as Soldes and Black Friday.  
- Sales use traffic from today and the two previous days  
  - About 45 percent from same day traffic  
  - About 35 percent from yesterday  
  - About 15 percent from two days ago  
- In the week before Soldes for clothing and before Black Friday for laptops more weight goes to lagged traffic because people wait for the promo start.  
- During the first days of Soldes same day conversion is faster so same day weight is raised.  
- Random dampening makes sure browsing does not always convert directly.

---

## 🎲 Randomness
- Both traffic and sales include noise.  
- This keeps the data realistic and not too perfect.
