Data preparation and feature engineering steps should be done in a separate project to match the expected input datasets format. Respecting the expected data model when loading new data helps ensure the results are reliable and easy to interpret.

## Input datasets format

The input data should be separated into 5 different datasets: 

![Screenshot 2023-04-20 at 16.37.54.png](Q1MQjXITB6wy)

 The datasets revenues_data, product_holdings_data and  balances_data must cover the **same period** and have the **same time frequency**. 

- (1) revenues_data: 

Revenue: the income a bank generates from each financial product it offers.

![Screenshot 2023-04-20 at 16.40.56.png](6yxBtaaHSVOK)

- (2) product_holdings_data: 

![Screenshot 2023-09-13 at 11.49.57.png](Y8HucqajWwUn)

**Note** : A product type is a product category that can encompass multiple individual products.

- (3) customers_data: 

![Screenshot 2023-05-04 at 13.48.23.png](zEFV2BfRfSXn)

- (4) balances_data: 

Balance: money held by the client (like in a checking account) or owed by the client (like a credit card or mortgage). 

![Screenshot 2023-05-04 at 13.48.46.png](CkxJJtgpL4kj)

- (5) additional_information_data: 

![Screenshot 2023-05-04 at 13.48.58.png](Sa8hQMAKlycl)

The datasets containing revenue, balance, and product holding information include only mandatory columns.  In contrast, the datasets for customers and additional information may be enriched with optional columns that can enhance the classification model's performance.

**Note**: The [Customer Segmentation for Banking](https://www.dataiku.com/solutions/catalog/customer-segmentation-banking/) solution completes the Next Best Offer solution within the marketing suite for banking. The user can plug the same data as in the current solution, and build an initial model. Moreover, the user can use the segmentation output as an additional column in the customer dataset of the Next Best Offer for Banking solution. 
