# Model Error Analysis[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

## Debug model performance with error analysis[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

## Plugin information[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

|  |  |

| --- | --- |

| Version | 1.1.4 (DSS≥10), 1.0.6 (DSS9) |

| Author | Dataiku (Agathe Guillemot and Simona Maggio) |

| Released | 2021-05-31 |

| Last updated | 2022-07-25 |

| License | Apache Software License |

| Source code | Github |

| Reporting issues | Github |

## Description[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

After training a ML model, data scientists need to investigate the model failures to build intuition on the critical subpopulations on which the model is performing poorly. This analysis is essential in the iterative process of model design and feature engineering and is usually performed manually.

The Model Error Analysis plugin provides the user with automatic tools to break down the model’s errors into meaningful groups, easier to analyse, and to highlight the most frequent type of errors, as well as the problematic characteristics correlated with the errors.

The plugin leverages the mealy package, which is developed and maintained by Dataiku’s research team.

## Setup[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

Right after installing the plugin, you will need to build its code environment. Note that this plugin requires Python version 3.6 and only works for python models trained on python 3.

## Principle[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

Model Error Analysis streamlines the analysis of the samples mostly contributing to the model’s mistakes. We call the model under investigation the *original model*.

This approach relies on an Error Tree, a secondary model **trained to predict whether the primary model prediction is correct or wrong, i.e. a success or a failure**. More precisely, the Error Tree is a binary **Decision Tree classifier** predicting whether the primary model will yield a *Correct Prediction* or an *Incorrect Prediction*.

The Model Error Analysis plugin automatically highlights any information relevant to the model’s errors, helping the user to focus on what are the problematic features, and what are the typical values of these features for the incorrectly predicted samples. This information can later be exploited to support the strategy selected by the user :

* **Improve model design**: removing a problematic feature, removing samples likely to be mislabeled, ensemble with a model trained on a problematic subpopulation, …

* **Enhance data collection**: gather more data regarding the most erroneous or under-represented populations,

* **Select critical samples for manual inspection** thanks to the Error Tree, and avoid primary predictions on them, generating model assertions.

## How to use[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

### Model views: Model Error Analysis[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

*Reminder: A model view is an additional way to visualize information about a model, model views appear in a deployed model’s version page. This feature was introduced in Dataiku DSS 6.0, if your models were deployed to your flows before v6.0 and you don’t see the “Views” tab, please go back to the saved model screen -> Settings and fill the **Model type** field with  “**prediction/py\_memory**“*.

#### General metrics[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

####[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

The top panel highlights the main metrics:

* **Original model error rate**: proportion of samples in the test set the primary model predicts incorrectly.

* **Fraction of total error**: incorrect predictions present in a selected node over the total number of incorrect predictions in the whole population.

* **Local error**: incorrect predictions in a selected node over the number of samples in the node.

#### The tree and its nodes[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

The **fraction of total error** is represented in the width of the tree branches and draws a path driving towards the nodes containing the majority of errors

The **local error** is represented with the level of red in a node, and it’s the error rate of the population of the node

The interesting nodes are the one containing the majority of errors (thickest branches), and possibly with the highest local error rates (highest red levels, especially higher than the original model error rate).

#### Left panel node analysis[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

When clicking on a node, a panel will appear on the left with several pieces of information:

* **Sample**: number and overall proportion of samples in the node.

* **Decision rule section**: specifies the decision path to reach the chosen node (column\_A < x and column\_B > y, etc.).

* **Fraction of total error**: the ratio between the number of errors in this node and the total number of  errors.

* **Local error**: error rate in the node, again represented as a red level in a circle.

The Decision rule section allows the user to know at a glance the segment of data represented by the node.

The **Univariate histogram section**:

* The top-5 features most correlated with the errors are displayed from top to bottom

* More features can be selected in the drop down menu.

* For each feature, we can compare its distribution in the node to its distribution in the whole test set thanks to the ‘in all samples’ toggle.

* For each bin, we show the percentage of correct/incorrect predictions.

* Looking at the discrepancy between the two distributions allows the user to see

+ Which features are the source of the problem.

+ Which feature values characterise the erroneous samples.

### Template notebook: Model Error Analysis tutorial[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

For coders who want to go further in their analysis, a python library is provided. It can be used with a visual DSS model or a custom scikit model.

The plugin comes with a python template notebook that can serve as a tutorial to help users understands how the python library works.

#### Install In DSS[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

To install the plugin, open the  Apps menu, click Plugins and search for Model Error Analysis.

Alternatively, you can download a zipped version here.

### Get the Dataiku Data Sheet[¶](https://www.dataiku.com/product/plugins/model-error-analysis/)

Learn everything you ever wanted to know about Dataiku (but were afraid to ask), including detailed specifications on features and integrations.
