# Load and re-use a Hugging Face model[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/hf-resources/index.html#load-and-re-use-a-hugging-face-model "Permalink to this heading")

Pre-requisites

* Dataiku >= 10.0.0.

* A Code Environment with the following packages:

§ transformers==4.24.0

Machine learning use cases can involve a lot of input data and compute-heavy thus expensive model training. It is common to download *pre-trained models* from remote repositories and use them instead. Hugging Face hosts a well-known one with models for image and text processing. In this tutorial, you will use Dataiku’s Code Environment resources feature to download and save a pre-trained text classification model from Hugging Face. You will then re-use that model to predict a masked string in a sentence.

## Downloading the pre-trained model[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/hf-resources/index.html#downloading-the-pre-trained-model "Permalink to this heading")

The first step is to download the required assets for your pre-trained model. To do so, in the *Resources* screen of your Code Environment, input the following **initialization script** then click on *Update*:

§ ## Base imports

§ import os

§ from dataiku.code\_env\_resources import clear\_all\_env\_vars

§ from dataiku.code\_env\_resources import grant\_permissions

§ from dataiku.code\_env\_resources import set\_env\_path

§ from dataiku.code\_env\_resources import set\_env\_var

§ # Clears all environment variables defined by previously run script

§ clear\_all\_env\_vars()

§ ## Hugging Face

§ # Set HuggingFace cache directory

§ set\_env\_path("HF\_HOME", "huggingface")

§ set\_env\_path("TRANSFORMERS\_CACHE", "huggingface/transformers")

§ hf\_home\_dir = os.getenv("HF\_HOME")

§ transformers\_home\_dir = os.getenv("TRANSFORMERS\_CACHE")

§ # Import Hugging Face's transformers

§ import transformers

§ # Download pre-trained models

§ model\_name = "distilbert-base-uncased"

§ MODEL\_REVISION = "1c4513b2eedbda136f57676a34eea67aba266e5c"

§ model = transformers.DistilBertModel.from\_pretrained(model\_name, revision=MODEL\_REVISION)

§ unmasker = transformers.DistilBertForMaskedLM.from\_pretrained(model\_name, revision=MODEL\_REVISION)

§ tokenizer = transformers.DistilBertTokenizer.from\_pretrained(model\_name, revision=MODEL\_REVISION)

§ # Grant everyone read access to pre-trained models in the HF\_HOME folder

§ # (by default, only readable by the owner)

§ grant\_permissions(hf\_home\_dir)

§ grant\_permissions(transformers\_home\_dir)

This script will retrieve a DistilBERT model from Hugging Face and stores it in the Dataiku Instance.

Note that it will only need to run once, after that all users allowed to use the Code Environment will be able to leverage the pre-trained model with re-downloading it again.

## Using the pre-trained model for inference[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/hf-resources/index.html#using-the-pre-trained-model-for-inference "Permalink to this heading")

You can now re-use this pre-trained model in your Dataiku Project’s Python Recipe or notebook. Here is an example adapted from a sample in the model repository that fills the masked parts of a sentence with the appropriate word:

§ import os

§ from transformers import pipeline

§ from transformers import DistilBertTokenizer, DistilBertForMaskedLM

§ # Define which pre-trained model to use

§ model = {"name": "distilbert-base-uncased",

§ "revision": "1c4513b2eedbda136f57676a34eea67aba266e5c"}

§ # Load pre-trained model

§ hf\_home\_dir = os.getenv("HF\_HOME")

§ model\_path = os.path.join(hf\_home\_dir,

§ f"hub/models--{model['name']}/snapshots/{model['revision']}")

§ unmasker = DistilBertForMaskedLM.from\_pretrained(model\_path, local\_files\_only=True)

§ tokenizer = DistilBertTokenizer.from\_pretrained(model\_path, local\_files\_only=True)

§ # predict masked output

§ unmask = pipeline("fill-mask", model=unmasker, tokenizer=tokenizer)

§ input\_sentence = "Lend me your ears and I'll sing you a [MASK]"

§ resp = unmask(input\_sentence)

§ for r in resp:

§ print(f"{r['sequence']} ({r['score']})")

Running this code should give you an output similar to this:

§ lend me your ears and i'll sing you a lullaby (0.29883989691734314)

§ lend me your ears and i'll sing you a tune (0.10296259075403214)

§ lend me your ears and i'll sing you a song (0.10061296075582504)

§ lend me your ears and i'll sing you a hymn (0.09704853594303131)

§ lend me your ears and i'll sing you a cappella (0.034581124782562256)

## Wrapping up[¶](https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/hf-resources/index.html#wrapping-up "Permalink to this heading")

Your pre-trained model is now operational! From there you can easily reuse it, e.g. to process multiple text records stored in a Managed Folder or within a text column of a Dataset.
