In this article, we delve into the various data preparation steps implemented within the solution.

## Create the consolidated input data model

To construct the consolidated input data model, the solution generates a unique row for each client, date, and product combination, enabling it to determine whether the client held the product at a given time. The steps followed to achieve this are detailed in the dedicated [flow zone article](article:23). 

**Example**:
Number of unique customer ID : 20
Number of unique product : 10
Number of unique date : 30 
```Number of rows in the consolidated input data model = 20 * 10 * 30 = 6000 rows```

## Generate a target value for the classification model

The creation of the target value **subscription** is done in the [Input Dataset Preparation](article:27) flow zone.

Using the consolidated input data model, each row was flagged to indicate whether the customer possessed the product at each given point in time. The corresponding column in this model has been named **has_product** and is represented by binary values of 1 or 0.

Retrieving the lead difference of the "has_product" column provides information on whether the customer will subscribe to the product within the chosen horizon. The horizon is a parameter to mention in the Project Setup. 

**Example**:

Assuming a horizon parameter of 3, the window recipe aggregation for the "has_product" column would appear as follows:
![Screenshot 2023-04-21 at 10.31.52.png](DoyCUQRk8KM8)

After applying the window recipe, the output values that are possible for the "subscription" column include -1 (subscribed), 0 (did not subscribe), and 1 (unsubscribed). The subsequent prepare recipe aims to ensure that the subscription column is converted into the three possible values, which are 1 (subscribed), 0 (did not subscribe), and blank (undefined).

## Develop additional features to enhance model performance

The creation of the additional features is done in the [Input Dataset Preparation](article:27) flow zone.

To enrich the input data model and enhance model performance, additional features are created, such as: 
- age of the customer
- age of the account
- total, average, and standard deviation of the revenue
- lag values of the additional information features and balances
- max/min/count product held 

Depending on the nature of the data uploaded to the solution by the user, it may be possible to create additional features by modifying the flow directly.

