# The Deep Learning Model[¶](https://knowledge.dataiku.com/latest/courses/advanced-analytics/nlp-code/model.html#the-deep-learning-model "Permalink to this headline")

In a Visual Analysis for the training dataset, create a new model with:

* **Prediction** as the task,

* *polarity* as the target variable

* **Expert mode** as the prediction style

* **Deep learning** as the Expert mode, then click **Create**

This creates a new machine learning task and opens the Design tab for the task. On the Target panel, verify that Dataiku DSS has correctly identified this as a Two-class classification type of ML task.

## Features Handling[¶](https://knowledge.dataiku.com/latest/courses/advanced-analytics/nlp-code/model.html#features-handling "Permalink to this headline")

On the Features Handling panel, turn off *sentiment* as an input, since *polarity* is derived from *sentiment*.

Dataiku should recognize *text* as a column containing text data, set the variable type to Text, and implement custom preprocessing using the TokenizerProcessor.

We can set two parameters for the Tokenizer:

* *num\_words* is the maximum number of words that are kept in the analysis, sorted by frequency. In this case we are keeping only the top 10,000 words.

* *max\_len* is the maximum text length, in words. 32 words is too short for these reviews, so we’ll raise the limit to the first 500 words of each review.

## Deep Learning Architecture[¶](https://knowledge.dataiku.com/latest/courses/advanced-analytics/nlp-code/model.html#deep-learning-architecture "Permalink to this headline")

We now have to create our network architecture in the `build\_model()` function. We won’t use the default architecture, so just remove all the code. Then, click on **{} Code Samples** on the top right and search for “text”. Select the **CNN1D** architecture for text classification.

Insert the CNN1D code then click on **Display inputs** on the top left. You should see that the “main” feature is empty because we are only using the review text, which is in the input *text\_preprocessed*.

In order to build the model, we only need to make a small change to the code. In the line that defines `text\_input\_name`, change `name\_of\_your\_text\_input\_preprocessed` to `text\_preprocessed`.

## Model Results[¶](https://knowledge.dataiku.com/latest/courses/advanced-analytics/nlp-code/model.html#model-results "Permalink to this headline")

Click **Train** and, when complete, deploy the model to the flow, create an evaluation recipe from the model, and evaluate on the test data. In the resulting dataset, you can see that the model has an accuracy of about 80% and an AUC of about 0.89. It’s possible that we can improve on these results by using pre-trained word embeddings.
