# Putting Active Learning in Place[¶](https://knowledge.dataiku.com/latest/courses/active-learning/putting-active-learning-in-place.html#putting-active-learning-in-place "Permalink to this headline")

We would like to improve the model performance by labeling more rows and adding them to the training set. This is where active learning comes into play. We will:

* Set up the recipe that uses the logistic regression model to determine the order in which rows should be manually labeled.

* Set up a web app that makes it easier to manually label rows.

* Set up a scenario that automates the rebuild of the ML model and then updates the order in which rows should be manually labeled.

* Set up a dashboard to monitor how well our active learning project is progressing.

## Set Up the Query Sampler Recipe[¶](https://knowledge.dataiku.com/latest/courses/active-learning/putting-active-learning-in-place.html#set-up-the-query-sampler-recipe "Permalink to this headline")

* Click **+ Recipe > ML-assisted Labeling > Query sampler**.

* For the Classifier Model, select *Prediction (LOGISTIC\_REGRESSION) on clickbait\_stacked*.

* For the Unlabeled Data, select *clickbait\_to\_classify*.

* For the output Data to be labeled, create a new dataset called `clickbait\_queries`.

* Click **Create**.

Ensure that **Smallest confidence sampling** is selected as the Query strategy. On binary classification tasks, all the strategies give the same results, but the smallest confidence sampling strategy is the least computationally expensive. **Run** the recipe.

Note

**Troubleshooting.** Did the recipe fail because of a code environment mismatch between the deployed model and the plugin? Rebuild and redeploy your model with a Python3 code environment or talk to your DSS administrator.

## Set Up the Labeling Application[¶](https://knowledge.dataiku.com/latest/courses/active-learning/putting-active-learning-in-place.html#set-up-the-labeling-application "Permalink to this headline")

Start by creating a web application to label images:

* Go to **</> > Webapps**

* Click **+ New Webapp > Visual Webapp > Tabular data labeling**.

* Name the webapp `Clickbait labeling` and click **Create**.

* Choose the following web app settings:

+ **Input**

- For Unlabeled data, select *clickbait\_to\_classify*.

- For Categories, add `clickbait` and `legit` as categories. Optionally give them more detailed descriptions.

+ **Output**

- For Labels dataset, create a new dataset named `clickbait\_labeled`.

- For Labeling metadata dataset, create a new dataset named `clickbait\_metadata`.

- Enter the Labels target column name as `clickbait`.

+ **Active Learning specific**

- For Queries, select *clickbait\_queries*, which is the dataset created by the Query Sampler recipe.

* From the webapp’s **Actions** menu, click **Start**.

It begins by initializing the output datasets. You could start labeling right away but we can make this experience even better for the labeler by first setting up the whole data processing pipeline.

The first task is to integrate newly manually labeled data into the training set of the model. Go back to the Flow, where you should see that a dataset *clickbait\_labeled* has been created.

* Open the Stack recipe we created earlier and click **Add input**.

* Select **clickbait\_labeled**.

* **Run** the recipe and update the schema.

All the labeled rows that will be added by the webapp into the dataset will now be fed into the model.

Note

For identification purposes, the labeling webapp creates a unique hash proper to each sample. This is where the additional *clickbait\_id* column comes from. It is not used anywhere else so you should not include this column in any processing.

## Set Up the Training Scenario[¶](https://knowledge.dataiku.com/latest/courses/active-learning/putting-active-learning-in-place.html#set-up-the-training-scenario "Permalink to this headline")

The Flow is now complete; however, the model must be manually retrained in order to generate new queries. The active learning plugin provides a scenario trigger to do it automatically. Let’s set it up.

* From the **Jobs** dropdown in the top navigation bar, select **Scenarios**.

* Click **+ New Scenario**

* Name it `Retrain clickbait` and click **Create**.

* Click **Add Trigger > ML-assisted Labeling > Trigger every n labeling**.

+ Name the trigger `Every 10 labeling`.

+ Set Run every (seconds) to `20`.

+ Set Labeling count to `10`.

+ Set the Metadata dataset to *clickbait\_metadata*.

+ Set the Queries dataset to *clickbait\_queries*.

* In the **Steps** tab:

+ Click **Add Step > Build/Train**.

- Name it `Stack data`.

- Click **Add Dataset to Build** and add *clickbait\_stacked*.

- For the Build mode, select **Build only this dataset**.

+ Click **Add Step > Build/Train**.

- Name it `Rebuild model`.

- Click **Add Model to Build** and add *Prediction (LOGISTIC\_REGRESSION) on clickbait\_stacked*

- For the Build mode, select **Build only this dataset**.

+ Click **Add Step > Build/Train**.

- Name it `Queries`.

- Click **Add Dataset to Build** and add *clickbait\_queries*.

- For the Build mode, select **Build only this dataset**.

+ Click **Add Step > Restart webapp**.

- Name it `Clickbait labeling`.

- Select *Clickbait labeling* as the web app to restart.

In the scenario **Settings** tab, set the auto-trigger of the scenario on. What happens now is that the scenario will trigger the generation of new queries every time 10 samples are labeled.

## Prepare a Monitoring Dashboard[¶](https://knowledge.dataiku.com/latest/courses/active-learning/putting-active-learning-in-place.html#prepare-a-monitoring-dashboard "Permalink to this headline")

In order to track model improvements over time, let’s create a dashboard. Go to Dashboards.

* Rename the default dashboard `AL monitoring`.

* Add a tile to the first slide of type **Metrics** insight with the following settings:

+ Set Type to **Saved model**.

+ Set Source to *Prediction (LOGISTIC\_REGRESSION) on clickbait\_stacked*.

+ Set Metric to **AUC**.

+ Click **Add**.

* Set the Metrics options to **History**.
