# Training MLflow models[¶](https://doc.dataiku.com/dss/latest/mlops/mlflow-models/training.html#training-mlflow-models "Permalink to this headline")

Just like any Python packages, you can use MLflow and the related frameworks in any DSS recipe, notebook, scenario, …

This allows you to train and save a MLflow model directly in DSS, which you can then import as a saved model in order to leverage visual scoring, evaluation, drift analysis, …

Dataiku also features an integration of MLflow Experiment Tracking. When leveraging it, trained models are automatically stored in a configurable managed folder (see Deploying MLflow models).

## Training a model[¶](https://doc.dataiku.com/dss/latest/mlops/mlflow-models/training.html#training-a-model "Permalink to this headline")

You can train the MLflow model either outside of DSS, in a python recipe, in a python notebook, or using Experiment Tracking…

The list of frameworks supported by MLflow is available in the MLflow documentation. These include the most common libraries such as PyTorch, TensorFlow, Scikit-learn, etc.

## Saving the MLflow model[¶](https://doc.dataiku.com/dss/latest/mlops/mlflow-models/training.html#saving-the-mlflow-model "Permalink to this headline")

You need to export your model in a standard format, provided by MLflow Models, compatible with DSS.

MLflow provides a *save\_model* function for each supported machine learning framework.

For instance, saving a Keras model using MLflow in a *model\_directory* will look like this:

§ ... ommitted Keras model training code

§ import mlflow

§ mlflow.keras.save\_model(model, model\_directory)

You can then import the exported model in DSS as a Saved Model

## Python recipe[¶](https://doc.dataiku.com/dss/latest/mlops/mlflow-models/training.html#python-recipe "Permalink to this headline")

The following snippet is a draft of a python recipe:

* taking a train and an evaluation dataset as inputs

* training a model

* saving it in MLflow format

* adding it as a new version to the saved model defined as output

§ import os

§ import shutil

§ import dataiku

§ from dataiku import recipe

§ client = dataiku.api\_client()

§ project = client.get\_project('PROJECT\_ID')

§ # get train dataset

§ train\_dataset = recipe.get\_inputs\_as\_datasets()[0]

§ evaluation\_dataset = recipe.get\_inputs\_as\_datasets()[1]

§ # get output saved model

§ sm = project.get\_saved\_model(recipe.get\_output\_names()[0])

§ # get train dataset as a pandas dataframe

§ df = train\_dataset.get\_dataframe()

§ # get the path of a local managed folder where to temporarily save the trained model

§ mf = dataiku.Folder("local\_managed\_folder")

§ path = mf.get\_path()

§ model\_subdir = "my\_subdir"

§ model\_dir = os.path.join(path, model\_subdir)

§ if os.path.exists(model\_dir):

§ shutil.rmtree(model\_dir)

§ try:

§ # ...train your model...

§ # ...save it with package specific MLflow method (here, SKlearn)...

§ mlflow.sklearn.save\_model(my\_model, model\_dir)

§ # import the model, creating a new version

§ mlflow\_version = sm.import\_mlflow\_version\_from\_managed\_folder("version\_name", "local\_managed\_folder", model\_subdir, "code-env-with-mlflow-name")

§ finally:

§ shutil.rmtree(model\_dir)

§ # setting metadata (target name, classes,...)

§ mlflow\_version.set\_core\_metadata(target\_column, ["class0", "class1",...] , get\_features\_from\_dataset=evaluation\_dataset.name)

§ # evaluate the performance of this new version, to populate the performance screens of the saved model version in DSS

§ mlflow\_version.evaluate(evaluation\_dataset.name)

Note

Experiment Tracking features logging of models in a configurable, and not necessarily local, managed folder.

Note

*local\_managed\_folder* should be a filesystem managed folder, on the DSS host, as we use the `dataiku.Folder.get\_path()` method to retrieve its path on the local filesystem then compute a directory path where the ML package can save the trained model.

Note

As this recipe uses a local managed folder, it should not be executed in a container.

Note

The 4th parameter of the `dataikuapi.dss.savedmodel.DSSSavedModel.import\_mlflow\_version\_from\_managed\_folder()` is the name of the code environment to use when scoring the model. If not specified, the code environment of the project will be resolved and used.

This code environment must contain the mlflow package and the packages of the machine learning library of your choice.

Note

A “Run checks” scenario step must be used to run the checks defined for the saved model on the metrics evaluated on the new version.

Warning

Recent versions of MLflow feature an ``mlflow.evaluate`` function. This function is different from `dataikuapi.dss.savedmodel.MLFlowVersionHandler.evaluate()`. Only the later will populate the interpretation screens of a saved model version in DSS.
