# Experiment tracking with LightGBM[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/experiment-tracking/lightgbm/index.html#experiment-tracking-with-lightgbm "Permalink to this heading")

In this tutorial you will train a model using the LightGBM framework and use the experiment tracking capabilities of Dataiku to log training runs (parameters, performance).

Pre-requisites

* Dataiku DSS 11.0.0 or higher

* Access to a Project with a Dataset that contains the UCI Bank Marketing data

* A code environment containing the `mlflow` and `lightgbm` packages

The following code snippet provides a reusable example to train a simple gradient boosting model with these main steps:

**(1)** Specify the categorical and numeric features and the target variable.

**(2)** Using the categorical and continuous variables spcecified, set up a preprocessing `Pipeline` with two transformation steps. First, define a transformer to one-hot-encode categorical variables and then impute any missing values in continuous variables and rescale them in the other.

**(3)** Define a dictionary containing the search space for hyperparameter tuning. Then lay out a cross-validation strategy to train a classifier and evaluate the model. Log the parameters and resulting metrics as well as the models using Experiment Tracking feature, while looping over combinations of hyperparameters. The model artifact logged for the run is also a Pipeline called `clf\_pipeline` that encapsulates the preprocessing and the model itself.

§ import dataiku

§ from lightgbm import LGBMClassifier

§ from sklearn.compose import ColumnTransformer

§ from sklearn.pipeline import Pipeline

§ from sklearn.impute import SimpleImputer

§ from sklearn.preprocessing import StandardScaler, OneHotEncoder

§ from sklearn.model\_selection import cross\_validate, ParameterGrid, StratifiedKFold

§ # !! - Replace these values by your own - !!

§ USER\_PROJECT\_KEY = ""

§ USER\_XPTRACKING\_FOLDER\_ID = ""

§ USER\_EXPERIMENT\_NAME = ""

§ USER\_TRAINING\_DATASET = ""

§ USER\_MLFLOW\_CODE\_ENV\_NAME = ""

§ client = dataiku.api\_client()

§ project = client.get\_project(USER\_PROJECT\_KEY)

§ ds = dataiku.Dataset(USER\_TRAINING\_DATASET)

§ df = ds.get\_dataframe()

§ # (1)

§ cat\_features = ["job", "marital", "education", "default",

§ "housing","loan", "month", "contact", "poutcome"]

§ num\_features = ["age", "balance", "day", "duration",

§ "campaign", "pdays", "previous"]

§ target ="y"

§ X = df.drop(target, axis=1)

§ y = df[target]

§ # (2)

§ num\_pipeline = Pipeline([

§ ("impute", SimpleImputer(strategy="median")),

§ ("scale", StandardScaler())

§ ])

§ cat\_transformer = OneHotEncoder(handle\_unknown="ignore")

§ preprocessor = ColumnTransformer(

§ transformers=[

§ ("num", num\_pipeline, num\_features),

§ ("cat", cat\_transformer, cat\_features),

§ ],

§ remainder="drop"

§ )

§ # (3)

§ hparams\_dict = {"learning\_rate": [1e-3, 1e-4],

§ "n\_estimators": [250, 500, 1000],

§ "seed": [47]

§ }

§ n\_folds = 5

§ param\_grid = ParameterGrid(hparams\_dict)

§ cv = StratifiedKFold(n\_splits=n\_folds)

§ mf = project.get\_managed\_folder(USER\_XPTRACKING\_FOLDER\_ID)

§ mlflow\_extension = project.get\_mlflow\_extension()

§ with project.setup\_mlflow(mf) as mlflow:

§ mlflow.set\_experiment(USER\_EXPERIMENT\_NAME)

§ for hparams in param\_grid:

§ with mlflow.start\_run() as run:

§ run\_id = run.info.run\_id

§ clf\_pipeline = Pipeline(steps=

§ [("preprocessor", preprocessor),

§ ("classifier", LGBMClassifier(\*\*hparams))

§ ])

§ scores = cross\_validate(clf\_pipeline, X, y, cv=cv, scoring='roc\_auc')

§ run\_metric\_mean = scores['test\_score'].mean()

§ mlflow.log\_metric('train\_mean\_auc', run\_metric\_mean)

§ for k,v in hparams.items():

§ mlflow.log\_param(k,v)

§ mlflow.sklearn.log\_model(sk\_model=clf\_pipeline, artifact\_path='model')

§ mlflow\_extension.set\_run\_inference\_info(run\_id=run\_id,

§ prediction\_type="BINARY\_CLASSIFICATION",

§ classes=['no', 'yes'],

§ code\_env\_name=USER\_MLFLOW\_CODE\_ENV\_NAME,

§ target=target)
