The Project Setup is the user entry point for the project. It is where data connection and project configuration are defined, and action buttons enable to rebuild the flow up until the dashboard where outputs are directly consumed. This automation is allowed by the scenarios defined in this [article](article:12). To build the project with demo data, ignore the first two steps and go straight to Column Identification.

## Data update disclaimer 
Before beginning the application setup, users should read the disclaimer carefully. They have three options to replace the demo datasets with new data: uploading data from a computer, connecting to a database or copying the connection settings and data from a Next Best Offer for Banking project.

## Option 1 - Data upload

If the user chooses to  **upload data**  from their computer, the following datasets: [revenues_data](dataset:revenues_data),  [balances_data](dataset:balances_data), [customers_data](dataset:customers_data), [product_holdings_data](dataset:product_holdings_data), and [additional_information_data](dataset:additional_information_data), are uploaded by the user through the Dataiku application interface.

![Screenshot 2023-06-01 at 18.10.46.png](qFxj2qDj79WR)

To refresh the project with new data, users can delete the existing datasets by clicking on the trash icon next to their name and then upload new files by dragging and dropping them or by clicking the "add a file" button.

After updating the datasets in the interface, users must press "check" to initiate the [Data upload - check schema](scenario:001DATAUPLOADCHECKSCHEMA) scenario, which will load the data, verify the schema, and build sync recipes to update the values and enforce the schema of the  [revenues](dataset:revenues),  [balances](dataset:balances), [customers](dataset:customers), [product_holdings](dataset:product_holdings), and  [additional_information](dataset:additional_information) datasets. 

## Option 2 - Data connection 

To update the project with new data, the user must update the connection settings section by connecting to a database.

![Screenshot 2023-06-01 at 18.11.01.png](cvL3WSBQdK4t)

The project comes pre-configured with all datasets using the filesystem connection. The user can either keep this setup or change it to their preferred connection by modifying the connection settings section and clicking on "run" to reconfigure the flow connections. 

By clicking the links  [revenues](dataset:revenues),  [balances](dataset:balances), [customers](dataset:customers), [product_holdings](dataset:product_holdings), and  [additional_information](dataset:additional_information), the user will access the connection setting page and will be able to change the path to their datasets.

After updating the datasets in the interface, users must press "check" to initiate the [Data connection - check schema](scenario:001DATACONNECTIONSCHECKSCHEMA) scenario, which will load the data, verify the schema, and remove the two sync recipes from the flow along with their respective input datasets.

## Option 3 - Copy connection and datasets

The Next Best Offer for Banking solution completes the customer segmentation solution within the marketing suite for banking. You can plug the same input data in both solutions. Additionally, you can use the segmentation output as input in the Next Best Offer for Banking solution.

![Screenshot 2023-07-11 at 15.53.00.png](Q1OjxRWdThwn)

## Clustering Configuration

![Screenshot 2023-06-02 at 08.36.01.png](Ql1etI4jykpU)

A few parameters need to be inputted by the user to configure the project. Before doing so, press the refresh button to update the reference date dropdown with the available ones. Then select the reference_date: a natural choice for this date is the latest. Define the lookback period, which can also be interpreted as a reference period: features will be computed both on a monthly basis and on a reference period basis. The value is expressed in number of months. Next, select the number of clusters, standard values range between 3 and 6 but depending on the particular use case, this number can be higher although it would require some more extensive work to interpret each of them.

Then Run button will trigger the [rebuild_flow](scenario:REBUILD_FLOW) scenario that will rebuild the whole flow. This action will take from a few minutes to hours if the input data is very large. Finally, press the link to the [dashboard](article:10) to access ready-made insights on this segmentation.


