# Code environments

## `py_310_sample_documentAI`

### Packages
``` 
pdf2image==1.17.0
opencv-python==4.10.0.84
pillow==10.4.0
unstructured==0.15.13
unstructured-client==0.25.9
unstructured-inference==0.7.36
unstructured.pytesseract==0.3.13
matplotlib==3.9.2
```

### Initialization script
`py_310_sample_documentAI` should also include the following [initialization script](https://doc.dataiku.com/dss/latest/code-envs/operations-python.html#managed-code-environment-resources-directory) (cf. the "Resources" tab of the code environment):
```
## Base imports
import os

from dataiku.code_env_resources import clear_all_env_vars
from dataiku.code_env_resources import grant_permissions
from dataiku.code_env_resources import set_env_path

# Clears all environment variables defined by previously run script
clear_all_env_vars()

## Hugging Face
# Set HuggingFace cache directory
set_env_path("HF_HOME", "huggingface")
set_env_path("TRANSFORMERS_CACHE", "huggingface/transformers")
hf_home_dir = os.getenv("HF_HOME")
transformers_home_dir = os.getenv("TRANSFORMERS_CACHE")

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="unstructuredio/yolo_x_layout", filename="yolox_l0.05.onnx")

# Grant everyone read access to pretrained models in the HF_HOME folder
# (by default, only readable by the owner)
grant_permissions(hf_home_dir)
grant_permissions(transformers_home_dir)
```

## `py_310_sample_documentAI_InternVL2`

### Packages

``` 
torch==2.4.1
transformers==4.45.1
accelerate==0.34.2
bitsandbytes==0.44.1
pillow==10.4.0
sentencepiece==0.2.0
timm==1.0.9
einops==0.8.0
``` 

### Initialization script

`py_310_sample_documentAI_InternVL2` should also include the following [initialization script](https://doc.dataiku.com/dss/latest/code-envs/operations-python.html#managed-code-environment-resources-directory) (cf. the "Resources" tab of the code environment):
```
## Base imports
import os

from dataiku.code_env_resources import clear_all_env_vars
from dataiku.code_env_resources import grant_permissions
from dataiku.code_env_resources import set_env_path
from dataiku.code_env_resources import set_env_var
from huggingface_hub import login


# Clears all environment variables defined by previously run script
clear_all_env_vars()

## Hugging Face
# Set HuggingFace cache directory
set_env_path("HF_HOME", "huggingface")
set_env_path("TRANSFORMERS_CACHE", "huggingface/transformers")
hf_home_dir = os.getenv("HF_HOME")
transformers_home_dir = os.getenv("TRANSFORMERS_CACHE")

# Import Hugging Face's transformers
import transformers

model_name = "OpenGVLab/InternVL2-8B"
model = transformers.AutoModel.from_pretrained(model_name, trust_remote_code=True,)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)


# Grant everyone read access to pre-trained models in the HF_HOME folder
# (by default, only readable by the owner)
grant_permissions(hf_home_dir)
grant_permissions(transformers_home_dir)
```

## `py_310_sample_documentAI_Qwen2-VL_UDOP`

### Packages

``` 
torch==2.4.1
torchvision==0.19.1
transformers==4.45.1
accelerate==0.34.2
bitsandbytes==0.44.1
dash==2.18.1
dash-bootstrap-components==1.6.0
outlines @ git+https://github.com/dottxt-ai/outlines.git@e07f5500fb92acd3647f2aa13aebaf292b3fcbbe
peft==0.13.0
pillow==10.4.0
protobuf==5.28.2
pytesseract==0.3.13
qwen-vl-utils==0.0.8
sentencepiece==0.2.0
``` 

### Initialization script
`py_310_sample_documentAI_Qwen2-VL_UDOP` should also include the following [initialization script](https://doc.dataiku.com/dss/latest/code-envs/operations-python.html#managed-code-environment-resources-directory) (cf. the "Resources" tab of the code environment):
```
## Base imports
import os

from dataiku.code_env_resources import clear_all_env_vars
from dataiku.code_env_resources import grant_permissions
from dataiku.code_env_resources import set_env_path
from dataiku.code_env_resources import set_env_var
from huggingface_hub import login


# Clears all environment variables defined by previously run script
clear_all_env_vars()

## Hugging Face
# Set HuggingFace cache directory
set_env_path("HF_HOME", "huggingface")
set_env_path("TRANSFORMERS_CACHE", "huggingface/transformers")
hf_home_dir = os.getenv("HF_HOME")
transformers_home_dir = os.getenv("TRANSFORMERS_CACHE")

# Import Hugging Face's transformers
import transformers

model_name = "microsoft/udop-large-512-300k"
processor_UDOP = transformers.AutoProcessor.from_pretrained(model_name)
model_UDOP = transformers.UdopForConditionalGeneration.from_pretrained(model_name)

model_name = "Qwen/Qwen2-VL-7B-Instruct"

model_qwen = transformers.Qwen2VLForConditionalGeneration.from_pretrained(
     model_name,
     attn_implementation="eager"
)
processor_qwen = transformers.AutoProcessor.from_pretrained(model_name)

# Grant everyone read access to pre-trained models in the HF_HOME folder
# (by default, only readable by the owner)
grant_permissions(hf_home_dir)
grant_permissions(transformers_home_dir)
```