# Hands-On Tutorial: Scoring Data[¶](https://knowledge.dataiku.com/latest/courses/scoring/scored-results/scored-results.html#hands-on-tutorial-scoring-data "Permalink to this headline")

In Hands-On Tutorial: Deploy the Model, we deployed our best-performing model from the Lab to the Flow. Our objective now is to use that same model (*Random Forest*) and apply it to a dataset of new customers (*customers\_unlabeled\_prepared*).

## Active Version of the Model[¶](https://knowledge.dataiku.com/latest/courses/scoring/scored-results/scored-results.html#active-version-of-the-model "Permalink to this headline")

* Select and open the model that is deployed to the Flow.

Without going into too much detail in this tutorial, notice that the model is marked as the **Active version**. If your data were to evolve over time (which is very likely in real life!), you could train your model again by selecting **Actions** and then **Retrain** from this screen. In this case, new versions of the models would be available, and you would be able to select which version of the model you’d like to use.

* Go back to the Flow and select the model again without opening it.

In the actions panel, you’ll see a **Retrain** button near the **Open** button. This is a shortcut to the function described above: you can update the model with new training data, and activate a new version.

## The Score Recipe[¶](https://knowledge.dataiku.com/latest/courses/scoring/scored-results/scored-results.html#the-score-recipe "Permalink to this headline")

Finally, the **Score** icon is the one we are looking for to apply the model to new data.

* Select the deployed model in the Flow.

* Choose the **Score** recipe.

Configure the input and output datasets as follows:

* Set the **input dataset** to *customers\_unlabeled\_prepared*.

* Name the **output dataset** `customers\_unlabeled\_scored`.

* Select the **Format** where you want to store the results into, such as “CSV”.

* Select **Create Recipe**.

You are now in the **Score recipe**.

As discussed in the Machine Learning Basics course, the threshold is the optimal value computed to maximize a given metric. In our case, it was set to 0.625. Rows with probability values above the threshold will be classified as high value, below as low value.

If you would like to return individual explanations of the prediction for each row in the *customers\_unlabeled\_prepared* dataset:

* Select the “Output explanations” checkbox. This action enables the “Force original backend” option so that the model can be scored with the same machine learning engine used during its training. Enabling the “Output explanations” checkbox also brings up some additional parameters:

+ For the “Computation method”, keep the default “ICE”.

+ For the “Number of explanations”, keep the default `3`. This returns the contributions of the three most influential features for each row.

* Select the **Run** button to score the dataset.

A few seconds later, you should see **Job succeeded**.

* Return to the Flow.

To recap, you:

* Started from the “historic data”.

* Applied a training recipe.

* Created a trained model.

* Applied the model to get the scores on the unlabeled dataset.

## Inspect the Scored Results[¶](https://knowledge.dataiku.com/latest/courses/scoring/scored-results/scored-results.html#inspect-the-scored-results "Permalink to this headline")

We’re almost done! Open the *customers\_unlabeled\_scored* dataset to see how the scored results look.

Four new columns have been added to the dataset:

* **proba\_False**

* **proba\_True**

* **prediction**

* **explanations** (as requested in the Score recipe)

The two “proba” columns are of particular interest. The model provides two probabilities, i.e. a value between 0 and 1, measuring the likelihood to *become* a high value customer (*proba\_True*), and the opposite likelihood to *not become* a high value customer (*proba\_False*).

The *prediction* column is the decision based on the probability and the threshold value of the Score recipe. Whenever the column *proba\_True* is above the threshold value (in this case, 0.625), then Dataiku will label that *prediction* “True”.

The *explanations* column contains a JSON object with features as keys and their positive or negative influences as values. For example, the highlighted row in the following figure shows the three most influential features (*age\_first\_order*, *campaign*, and *pages\_visited\_avg*) for this row and their corresponding contributions to the prediction outcome.

In order to more easily work with the JSON data in this column, you can apply a Prepare recipe to the *customers\_unlabeled\_scored* dataset. In the recipe, apply the **Unnest object (flatten JSON)** processor to the *explanations* column.

To learn more about individual prediction explanations, see the reference documentation.

## Learn More[¶](https://knowledge.dataiku.com/latest/courses/scoring/scored-results/scored-results.html#learn-more "Permalink to this headline")

That’s it! You now know enough to build your first predictive model, analyze its results, and deploy it. These are the first steps towards a more complex application. Great job!
