# How to use Azure AutoML from a Dataiku DSS Notebook[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#how-to-use-azure-automl-from-a-dataiku-dss-notebook "Permalink to this heading")

Azure Machine Learning can be used for any kind of machine learning, from classical ML to deep learning, supervised, and unsupervised learning. Whether you prefer to write Python or R code or zero-code/low-code options such as the designer, you can build, train, and track highly accurate machine learning and deep-learning models in an Azure Machine Learning Workspace. Users can apply automated ML when they want Azure Machine Learning to train and tune a model for them using a specified target metric.

It is currently possible to leverage Azure’s AutoML capabilities from within a DSS python notebook. The following details the necessary configuration for Azure and DSS to make this integration possible, as well as a code example of how to create and deploy an Azure AutoML job from DSS.

## Configuration[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#configuration "Permalink to this heading")

### Azure side[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#azure-side "Permalink to this heading")

* Create a machine learning workspace.

* Create a storage account:

>

>

> 	+ Choose the new “StorageV2”

> 	+ Once created, go to the storage page -> Containers -> Create a container (mine is container-of-dku). This will be the container used by DSS.

>

### Dataiku DSS side[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#dataiku-dss-side "Permalink to this heading")

* Create the new connection to the storage that we just created:

>

>

> 	+ Access Key can be found on the portal UI when going to your storage page -> Access Key.

> 	+ Path restrictions -> Container -> container-of-dku

> 	+ Advanced -> HDFS interface -> ABFS

>

* Create a python3 (mandatory) code-env with these requirements (according to here):

>

>

> 	+ matplotlib

> 	+ numpy

> 	+ cython

> 	+ urllib3

> 	+ scipy

> 	+ scikit-learn

> 	+ tensorflow

> 	+ xgboost

> 	+ azureml-sdk

> 	+ azureml-widgets

> 	+ azureml-explain-model

> 	+ pandas-ml

> 	+ azureml-defaults

> 	+ azureml-dataprep[pandas]

> 	+ azureml-train-automl

> 	+ azureml-train

> 	+ azureml-widgets

> 	+ azureml-pipeline

> 	+ azureml-contrib-interpret

> 	+ pytorch-transformers==1.0.0

> 	+ spacy==2.1.8

> 	+ onnxruntime==1.0.0

> 	+ `https://aka.ms/automl-resources/packages/en\_core\_web\_sm-2.1.0.tar.gz`

>

* The AutoML API we will use requires that the files are uncompressed csv, and they must have the columns name. This is not the default behavior when DSS creates a managed azure dataset, so there are some format config we need to modify manually.

>

>

> 	+ To do that, go to azure dataset on the flow you intend to use as the input for your model. Then select Settings -> Previews and input the following values:

>

>

>

> 	>

> 	>

> 	> 	- File compression: None

> 	> 	- Choose Parse next line as column headers

> 	>

> 	+ Return to the parent recipe of the dataset and rerun it

>

* To check before starting, go to your storage page -> Containers -> YOUR\_CONTAINER -> dataiku -> YOUR\_PROJECT -> YOUR\_DATASET. Inside that folder you should see that your files are in the format “out-sX.csv”. Download the first file and check that it contains the column names.

## Train an AutoML model[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#train-an-automl-model "Permalink to this heading")

* Create a Python Code recipe with the desired training data, stored in Azure, as your recipe input

* Open up your recipe in a notebook and set the code-env to the previously created AzureML environment

### Connect to the workspace[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#connect-to-the-workspace "Permalink to this heading")

§ from azureml.core import Workspace

§ # THIS INFORMATION CAN BE FOUND ON THE AZURE WORKSPACE UI

§ config = {

§ "subscription\_id": "XXXXXXXXXXX",

§ "resource\_group": "resource-of-dku",

§ "workspace\_name": "playground-of-dku"

§ }

§ subscription\_id = config.get('subscription\_id')

§ resource\_group = config.get('resource\_group')

§ workspace\_name = config.get('workspace\_name')

§ ws = Workspace(subscription\_id = subscription\_id, resource\_group = resource\_group, workspace\_name = workspace\_name)

### Create an Experiment[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#create-an-experiment "Permalink to this heading")

§ from azureml.core.experiment import Experiment

§ experiment\_name = 'automl-experiment-of-dku'

§ experiment = Experiment(ws, experiment\_name)

### Create a Compute Cluster[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#create-a-compute-cluster "Permalink to this heading")

§ from azureml.core.compute import AmlCompute

§ from azureml.core.compute import ComputeTarget

§ # Choose a name for your cluster.

§ amlcompute\_cluster\_name = "cluster-of-dku"

§ found = False

§ # Check if this compute target already exists in the workspace.

§ cts = ws.compute\_targets

§ if amlcompute\_cluster\_name in cts and cts[amlcompute\_cluster\_name].type == 'AmlCompute':

§ found = True

§ print('Found existing compute target.')

§ compute\_target = cts[amlcompute\_cluster\_name]

§ if not found:

§ print('Creating a new compute target...')

§ provisioning\_config = AmlCompute.provisioning\_configuration(vm\_size = "Standard\_D4\_v2",  max\_nodes = 10)

§ compute\_target = ComputeTarget.create(ws, amlcompute\_cluster\_name, provisioning\_config)

§ print('Checking cluster status...')

§ compute\_target.wait\_for\_completion(show\_output = True, min\_node\_count = None, timeout\_in\_minutes = 20)

Creation of the cluster will take some time so a waiting time of a few minutes is normal.

### Define the datastore for ML purpose[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#define-the-datastore-for-ml-purpose "Permalink to this heading")

§ from azureml.core import Datastore

§ import dataiku

§ client = dataiku.api\_client()

§ my\_dss\_azure\_connection\_name = 'azure\_blob\_dku' # CHANGE THIS TO YOUR CONNECTION

§ azure\_connection = client.get\_connection(my\_dss\_azure\_connection\_name)

§ azure\_connection\_info = azure\_connection.get\_info().get('params', {})

§ datastore\_name='datastore\_of\_dku' # CHANGE THIS

§ container\_name = azure\_connection\_info.get('chcontainer')

§ account\_name = azure\_connection\_info.get('storageAccount')

§ account\_key = azure\_connection\_info.get('accessKey')

§ default\_managed\_folder = azure\_connection\_info.get('defaultManagedContainer')

§ blob\_datastore = Datastore.register\_azure\_blob\_container(workspace=ws, datastore\_name=datastore\_name, container\_name=container\_name, account\_name=account\_name, account\_key=account\_key)

Represent your azure dataset as a TabularDataset, so that it can be used with the autoML api

§ from azureml.core.dataset import Dataset

§ dataset = dataiku.Dataset('simple\_table\_azure') # CHANGE THIS INPUT DATASET NAME

§ project\_key = dataset.get\_config().get('projectKey')

§ raw\_path = dataset.get\_config().get('params').get('path')

§ path = raw\_path.replace('${projectKey}', project\_key)

§ my\_train\_dataset = Dataset.Tabular.from\_delimited\_files(path=[(blob\_datastore, path)], separator='\t')

You can test if the file format has been configured correctly by doing the following:

§ my\_train\_dataset.take(3).to\_pandas\_dataframe()

It should return a clean dataframe.

If everything looks good, register your TabularDataset, the name and description are not that important, so you can just keep them as in the example if you want.

§ my\_train\_dataset = my\_train\_dataset.register(workspace=ws, name='simple\_blob\_train\_dataset', description='Simple training data', create\_new\_version=True)

### Configure the AutoML task[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#configure-the-automl-task "Permalink to this heading")

§ import logging

§ from azureml.train.automl import AutoMLConfig

§ target\_column  = 'Churn' #CHANGE THIS TO THE NAME OF YOUR TARGET COLUMN

§ automl\_settings = {

§ "n\_cross\_validations": 2,

§ "primary\_metric": 'average\_precision\_score\_weighted',

§ "whitelist\_models": ['RandomForest'],

§ "enable\_early\_stopping": True,

§ "max\_concurrent\_iterations": 8,

§ "max\_cores\_per\_iteration": -1,

§ "experiment\_timeout\_hours" : 0.25,

§ "iteration\_timeout\_minutes": 1,

§ "verbosity": logging.INFO,

§ }

§ automl\_config = AutoMLConfig(task = 'classification', debug\_log = 'automl\_errors.log', compute\_target = compute\_target, training\_data = my\_train\_dataset, label\_column\_name = target\_column, \*\*automl\_settings)

### Submit the task[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#submit-the-task "Permalink to this heading")

§ remote\_run = experiment.submit(automl\_config, show\_output = True)

And that’s it, now just sit back, and wait until the job completes. It will output a summary of its training test and best model.

## Deploy the model[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/others/azure-ml-in-notebook-kb/index.html#deploy-the-model "Permalink to this heading")

There is no easy way to do this by code, so users will need to create 1) a python scoring file and 2) a yaml configuration file.

Meanwhile, in the UI, it just takes 2 click like specified here, so we recommend users to deploy via the UI.

Users can choose to either deploy the model on Azure Kubernetes Service (AKS) or Azure Container Instance (ACI).

To test the endpoint, you can try this code snippet in a notebook:

§ # URL for the web service

§ scoring\_uri = "http://f9ed9cf6-3541-4f8e-aed3-525d8e731a38.eastus2.azurecontainer.io/score"

§ # If the service is authenticated, set the key or token

§ key = '<your key or token>'

§ scoring\_data = [[11.01, 717.2, 75.56]]

§ data = {"data": scoring\_data}

§ # Convert to JSON string

§ input\_data = json.dumps(data)

§ # Set the content type

§ headers = {'Content-Type': 'application/json'}

§ # If authentication is enabled, set the authorization header

§ #headers['Authorization'] = f'Bearer {key}'

§ # Make the request and display the response

§ resp = requests.post(scoring\_uri, input\_data, headers=headers)
