This zone is where the modelling takes place. The model training is done using the[alerts_train](dataset:alerts_train), and one the scoring takes place on the [alerts_unlabelled](dataset:alerts_unlabelled).
 
![Alert Triage.png](eCHWNzsf0pFG)
 
The [model](saved_model:8RKdGUmB) is a two-class classification model that predicts the is_escalated variable. The dataset is imbalanced with around 4% of 1 and the rest of 0. The training and validation sets are defined in chronological order, with the first 80% days belonging to the training set. The imbalance is handled using class weights as a weighting strategy. All variables that make sense are included in the model, and processed in a standard way (dummy encoded for categories and normalized for numerical variables). The four top algorithms are selected with their default hyperparameter sets. 
 
In the design part of the model, we choose a custom cost matrix function to optimize the threshold. Thus, false negatives are heavily weighted, and true positive and false positive have the same weight but with opposite signs. Therefore, the user can input how important each of the metrics is within the parametrization of the model.
 
![Cost Matrix.png](GILXLyf6KhfC)
 
The XGBoost algorithm is selected because it shares the same performance as random forests but it is lighter. Variable Importance  shows that orig_is_escalated_avg is the most important variable, which makes sense because it means that knowing if clients have previously had escalated alerts gives information about their new alerts being escalated. Additional information about model explainability can be found [here](article:7). The model has a very good performance, so it will likely discriminate well between real and false alerts.
 
The deployed model is then used to score the new alerts and the main output from this scoring is the proba_1 column which can be interpreted as a priority. Compliance officers would first investigate the high priority alerts before processing the ones further down the list. Thus the scored alerts are sorted by priority and represent the output of the project that will be used elsewhere.
 
 

