# Walkthrough

## Clustering Configuration 

![Screenshot 2023-06-01 at 14.10.51.png](u3oCD1PDyW2u)

In the reference date dropdown, the latest date (2016-05-28) is selected. The lookback window, in months, is set to 3 and the number of clusters is set to 4. Both of these options may be modified to generate different results and insights. 

To complete the project setup the Flow is rebuilt by pressing the Run button. Please note for guidance during your own analysis, using our dataset and hardware setup this final step takes about 10 minutes to complete. Once finished, a collection of powerful analyses are available in the precomputed [dashboard](dashboard:9J2VUGB). These provide insight into the overall portfolio and integrate the newly generated Segments to enhance our understanding further.

## Dashboard

In this section, we will go through each of the slides of the [dashboard](dashboard:9J2VUGB) and explain the kind of insights that can be extracted from their analysis.

### Segmentation Model

The first slide gives an overview of the Segmentation model. Initially, the segments will be named according to the variables that are most distinct and influential in associating a specific customer with that specific segment. We can edit these names to something shorter or more business-relevant by going into the Saved Model interface and entering our own custom titles (please note that changes made directly in the Dashboard, instead of in the Saved Model interface, will not save across the project).

![Screenshot 2023-06-01 at 16.54.20.png](NAskzSdwgcBi)

![Screenshot 2023-06-01 at 16.54.28.png](RF6RadQXe2MM)

The four segments identified by this project when using our simulated dataset, and then renamed for ease of analysis, are:

 - Emergents: Young customers with lower than average revenues, with potential value in the future.
 - Traditionalists: Older than average customers, with low but steady engagement.
 - Loyalists: Active and valuable customers with a diverse portfolio and a long-term engagement.
 - Sophisticates: Demanding customers that yield high revenues but might easily churn if dissatisfied.
 
As you can see, these segments are not unusual within the retail banking space. The exciting part is that each customer has been assigned to these segments using an entirely data-driven approach: no business rules were required. Customers were assigned to each segment using machine learning, and the underlying process used by that machine learning model can be understand using information available on this dashboard. 

In our dataset, emergents and sophisticates make up most of the customer base, while the two other segments are much smaller. 

![Screenshot 2023-06-01 at 16.57.45.png](XzookhBNvUOn)
![Screenshot 2023-06-01 at 16.58.13.png](YzBy1Ukbdu1u)

The variables listed in the Observations,  Heatmap, and Variable Importance sections were key drivers in the separation of the various segments. Using these analyses we can understand the underlying logic of the model, and compare and constrast it directly with our business knowledge and insight. For example, using these visual analyses we can see that differences between Loyalists and Sophisticates lie in the number and type of products they hold, in particular we observe that a key distinction separating Sophisticates is their use of securities, with Loyalists more active in secured lending. 

These are not only powerful tool for understanding the model from a business perspective, but are also effective from a data science perspective when considering tweaking the underlying model and re-running the results to improve quality even further. For example:

- In our dataset customer age and account age are part of the most important drivers of a segmentation, and we know that in our model we did not apply a log transformation to these variables though we did apply it to other numerical variables. If we applied that transformation to these variables and re-ran the model, we would shrink their variance and that would in turn reduce the weight of these variables in the segmentation. 
- It is also possible to modify the way numerical variables are rescaled: we could shift from the standard rescaling used in this project (which preserves outliers) to a min-max rescaling, which would reduce the impact of those variables. 
- In the case where some of the segments created have a very small number of cases in order to take into account outliers that were not good fits for other segments, the Visual ML interface allows for defining a proportion of outliers that will not belong to any segmentation. This can result in a more stable and cohrent model with the result that some small number of customers won't be assigned to any segment.

### Segment Analysis

This slide on segment analysis focuses on the composition and value of each of the segments. 

![Screenshot 2023-06-01 at 17.02.10.png](KEFPaASEr00y)

We can immediately observe that the largest share of revenue is sourced from the Sophisticates segment, and this is driven most heavily by Securities products. The Loyalists are the next larger revenue share, heavily weighted in Unsecured Lending products. Traditionalists and Emergents provide low revenue levels. On the cross-sell side, observations are similar: Traditionalists and Emergents have low cross-sell on average while Loyalists and Sophisticates are more likely to hold at least two different product types.

![Screenshot 2023-06-01 at 17.11.41.png](eeee9Q2wNJVi)

These segments are clearly delineated by age. The Emergents segment has in part been named that because they are made up more of young and lower income customers, and similarly the Traditionalists have the highest proportion of older and wealthier customers.

![Screenshot 2023-06-01 at 17.12.57.png](rUL61RQ8h1Xp)

The above graph displays the most likely portfolios of products held by customers belonging to each segment. We can see that Emergent customers predominantly hold only a Deposit Account, and Loyalists, Traditionalists and Sophisticates hold more diverse portfolios, with the former leaning more towards Secured Lending and the latter towards Securities.

![Screenshot 2023-06-01 at 17.14.49.png](tvAcK0VWPCkw)

Finally, the pivot tables give suscient quatified metrics showing the composition of each segment, and can be used to interpret the potential impact of 'shifting' a customer from one cross sell level, tier, or segment into another.

### Tier Analysis

This slide focuses on the tiers, which in our example represent the existing categorizations used by the bank to categorize their customers. These might be driven by business rules, or some other techniques, and likely determine which business units oversee a customer and what levels of service the customer receives. Note that the analytics present on this dashboard do not require a 'new' customer segmentation model in order to be valuable: you can think of this as showing how existing client analytics already performed or desired by your team can be handled easily in Datiaku.

In the demonstration dataset, customers have been assigned to one of three categories:

 1. GOLD: the highest tier, corresponding to the highest service costs for the bank, but also should generate the most revenue per customer.
 2. SILVER: intermediate category, with lower costs and lower potential revenues.
 3. BRONZE: the minimum level of service.

![Screenshot 2023-06-01 at 17.18.06.png](K7gaQApgy0G5)

For each tier, revenues come from all product types but with slightly different proportions. Gold customers are overweight on Securities, Silver on Unsecured Lending, while Bronze revenues come mainly from Unsecured Lending. We can see in the cross-sell graph that the bulk of the customers are either in the Silver or Bronze tiers, although total revenue is higher for Gold customers than for Bronze customers, meaning that on average, Gold customers are more valuable, which corresponds to the tier definitions we made above.

![Screenshot 2023-06-01 at 17.21.04.png](b36T9uPgPEJn)

The categories seem quite strongly correlated with age, the distribution shifts towards higher values as the tier moves from Bronze to Gold.

![Screenshot 2023-06-01 at 17.22.16.png](7YMujWAfN4oo)

In terms of product mix, portfolios are much more diverse for Gold customers, and increasingly less for Silver and Bronze. More than 77% of Bronze customers only hold Deposit Accounts, this proportion goes down to 48% for Silver and to less than 9% for Gold.

![Screenshot 2023-06-01 at 17.23.17.png](yBHJw5ZojISk)

The grids allow an analyst to immediately see how cross-sell and tiering affect the profitability of a customer. The left table refers to the total revenue, and gives an idea of where the largest shares of revenue are coming from. The rightmost table shows the distribution of customers into each cross-sell and tier combination. This can be helpful in deciding which cells in the center table to consider analytically sound, by setting some cut-off of the number of clients in a cell needed before a result could be considered 'valid'. In the center we evaluate the average revenue generated per customer, by cross-sell and tier. There is a very clear pattern of increasing profitability through cross-sell, whatever the tier. 

### Segment and Tier

The following slide compares segments with tiers to understand how they are linked to each other.

![Screenshot 2023-06-01 at 17.37.53.png](1IdWaMi3eQ8E)

From the right-hand side graph, we can observe that Gold customers contain a majority of Sophisticates, then come Loyalists and Traditionalists, with very few Emergents. This makes sense when coming back to the definition of those segments. On the opposite end, Bronze customers are made up of nearly only Emergents with very few other segments. Finally the Silver customer sit in the middle, with a larger proportion of Sophisticates, and then in decreasing order Loyalists,  Emergents and Traditionalists.

The left-hand side graphs brings a slightly different perspective but with still the same data being displayed. 

![Screenshot 2023-06-01 at 17.49.47.png](f9LfknWcFKYr)
  
On the revenue repartition, the first observation is that the breakdown by tier is quite different from the above one, whereas the breakdown by segment looks similar to the one for customer count. Within each tier, most of the revenues come from the Loyalists and Sophisticates. On the left-hand side, the revenue repartition is nearly identical to the customer count one, this means that on average, within a segment each customer yields the same revenue regardless of the tier. Thus, the segments discriminate better between most and least valuable customers than tiers.
 
![Screenshot 2023-06-01 at 17.50.26.png](84BCROTHYmZM)

Finally the pivot tables enable to have a look at the actual numbers behind the above graphs. The bulk of revenues comes from Sophisticates and in particular Silver ones, although within each segment, Silver customer generate a lower average revenue. Another observation is that average customer revenue by tier does not vary much within a segment while they vary significantly by segment within a tier, which confirms the conclusion made in previous graphs.

### Segment Evolution

The final slide looks at the dynamics of the segmentation.

![Screenshot 2023-06-01 at 17.52.31.png](Z393bamjxInC)

The stability graph displays the likelihood of any specific client remaining in the same segment after a period of one month, or the reference period (which is set to three months in our example). Since the refence period is longer than the baseline of 'one month', we see it is lower. Comparing segments lets us draw the conclusion that the most stable segments are Traditionalists and Loyalists and the least stable are Emergents, coherent with our expectations.

![Screenshot 2023-06-01 at 18.03.00.png](S7rwxZ2mLwjr)

Sankey charts represent transitions from one segment to another over time. The first is plotted on a single month transition, and the second on the reference period (which in this example is three months). Both show a similar pattern. Starting with the Emergents segments, which is likely to be a starting point for an incoming customer, the most common result if, of course, that they remain within the same segment (see the Stability charts above). However, the next most likely transition observed is into the Sophisticates’ segment, with a smaller share moving directly to Loyalists and an even smaller to Traditionalists. Note that the width of the link is proportional to the ratio of moving customers divided by customers belonging to the source segment, so they are not directly comparable, in terms of numbers of customers.

![Screenshot 2023-06-01 at 17.56.06.png](Y50NO5RpHCC3)

The last three graphs show the time series for the three core metrics, by segment: number of customers, total revenue, and average revenue per customer. They allow us to evaluate evolutions and trends in the compositions of these segments, and see if marketing or other efforts are having measuable impacts. Some observations that can be made:

 - Many customers seem to have moved from Emergents to Sophisticates during April 2015, perhaps there was a notable event or campaign.
 - Total revenue coming from Sophisticates and Loyalists increased significantly over the whole period. Perhaps this suggests a strategic re-evaluation is in order.
 - Average revenue per customer decreased noticeably Loyalists in April 2015, but otherwise, these values remained quite stable.

# Conclusion

These visualizations are actionable insights that allow your marketing specialists to instantly understand revenue share, product mix, and much more. Hence, it is possible to adjust your strategy using your data.
