# How to Automate Project Cleaning and Maintenance[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#how-to-automate-project-cleaning-and-maintenance "Permalink to this headline")

Contents

* Introduction and Definitions

* Create the Admin Project

* Automate Project Cleaning and Maintenance

+ Create a Scenario

+ Trigger the Scenario to Run as Admin

* Maintenance Macro Descriptions

+ Backup Internal Databases

+ Clear App Instances

+ Clear Continuous Activities Logs

+ Clear Deleted Experiments

+ Clear Internal Databases

+ Clear Job Logs

+ Clear Model Versions

+ Clear Scenario Run Logs

+ Drop Pipeline Views

+ List Datasets on Connection

+ Remove Old Container Images

+ Remove Old Exports

+ Kill Jupyter Sessions

## Introduction and Definitions[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#introduction-and-definitions "Permalink to this headline")

Project cleaning and maintenance can quickly become repetitive tasks. Built-in maintenance macros allow you to automate these tasks across one or all projects on your instance. You automate project cleaning and maintenance across a Dataiku DSS (DSS) instance by creating an Admin Project.

At the end of this article, you’ll be able to perform the following tasks:

* Create an Admin Project and run maintenance macros.

* Understand the DSS project cleaning and maintenance macros and be able to describe them.

* Create a scenario to automate cleaning and maintenance tasks.

**What is a Maintenance Macro?**

With maintenance macros, you clean and maintain all projects on your DSS instance. Maintenance macros help you perform maintenance tasks such as deleting jobs and temporary files. For some maintenance macros, you can configure the steps in a scenario to execute the macro across one or all projects on the instance.

To view DSS maintenance macros, navigate to the **More Options (“…”)** menu and choose **Macros**.

Visit DSS Macros for technical details about macros.

**What is an Admin Project?**

An Admin Project is a blank project you create that is accessible only to admins on the instance. An Admin Project contains a scenario with one step for each macro you want to execute.

After following the steps described in this article, your Admin Project will look like this:

## Create the Admin Project[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#create-the-admin-project "Permalink to this headline")

To begin, let’s start with a new, blank project. You’ll need to create the project while signed in to the instance as a user with administrator privileges. This will become more obvious when you are creating steps in your scenario. Without administrator privileges, you will not have the option to apply macros to all projects on the instance.

* In your DSS instance, create a new, blank project and give it a name like, `Admin Project`.

Next, we’ll set the project visibility and permissions:

* From the top navigation bar in your project, navigate to the **More Options (“…”)** menu and choose **Security**.

* Set the **Project visibility** to **Private**.

* Set **Project access requests** to **Disabled**.

We’ll want administrators to have access to this project, so let’s grant access to the administrators group.

* Choose **administrators** and click **Grant Access to Group**.

* Select **Admin** to give administrators Admin permissions.

* Save your changes.

Our next step is to configure a scenario to run our macros.

## Automate Project Cleaning and Maintenance[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#automate-project-cleaning-and-maintenance "Permalink to this headline")

You can run macros manually or automatically from a scenario step. In this section, we’ll create a scenario to execute five maintenance macros. These five macros are recommended as part of project maintenance best practices:

* Clear job logs

* Clear scenario run logs

* Kill Jupyter sessions

* Clear internal database (if you are using initial internal database h2)

Note

For production environments, the use of an externally hosted PostgreSQL runtime database is recommended. Visit Externally hosting runtime databases for more information.

* Clear continuous activities

Note

Clearing continuous activities is recommended if you are using streaming features.

Let’s configure a scenario to run the recommended macros.

### Create a Scenario[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#create-a-scenario "Permalink to this headline")

To create your scenario:

* From the **Jobs** menu, navigate to the **Scenarios** panel, and create a new scenario.

* Ensure **Sequence of steps** is selected, and name it `Maintenance Scenario`.

#### Add a Trigger[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#add-a-trigger "Permalink to this headline")

Let’s create a time-based trigger so that our scenario runs each hour.

* Within the Triggers panel of the **Settings** tab, click the **Add Trigger** dropdown menu.

* Add a **Time-based trigger**.

* Instead of the default “Time-based”, name it `Every 1 hour`.

* Change **Repeat every** to 1 hours.

* Make sure its activity status is toggled to **ON**.

Next, we’ll add steps, one for each macro we want to run. We’ll configure each step to execute the macro on all projects on the instance.

#### Add a Step to Clear Job Logs[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#add-a-step-to-clear-job-logs "Permalink to this headline")

Our first step will run a macro to clear job logs.

* Navigate to the **Steps** tab.

* From the **Add Step** dropdown, choose to add an **Execute macro** step.

* Name it `Clear job logs` then select **Clear job logs** as the macro.

* Set the **Max age (days)** to `9`.

* Select **Perform deletion**.

* Select **All projects**.

This tells DSS to delete logs older than nine days for all projects on the instance.

* Save your changes.

#### Add a Step to Clear Scenario Run Logs[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#add-a-step-to-clear-scenario-run-logs "Permalink to this headline")

This step will clear scenario run logs.

* From the **Add Step** dropdown, choose to add an **Execute macro** step.

* Name it `Clear scenario run logs` then select **Clear scenario run logs** as the macro.

* Set the **Max age (days)** to `9`.

* Select **Perform deletion**.

* Select **All projects**.

This tells DSS to delete all logs and temporary files of scenario runs that are older than nine days.

#### Add a Step to Kill Jupyter Sessions[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#add-a-step-to-kill-jupyter-sessions "Permalink to this headline")

Let’s add a step to kill Jupyter sessions to free up some memory on the instance when Jupyter sessions have been running for too long or when they have been idle for too long.

* From the **Add Step** dropdown, choose to add an **Execute macro** step.

* Name it `Kill Jupyter sessions` then select **Kill Jupyter sessions** as the macro.

* Leave the default settings.

This tells DSS to delete old and unused Jupyter sessions.

#### Add a Step to Clear Internal Databases[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#add-a-step-to-clear-internal-databases "Permalink to this headline")

Adding a step to clear internal databases can help resolve performance degradation.

* From the **Add Step** dropdown, choose to add an **Execute macro** step.

* Name it `Clear internal databases` then select **Clear internal databases** as the macro.

* Select **Clear for all projects**.

* Set **Max age** to `9`.

This tells DSS to truncate jobs, scenarios and metrics histories for all projects.

#### Add a Step to Clear Continuous Activities Logs[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#add-a-step-to-clear-continuous-activities-logs "Permalink to this headline")

When working with continuous activities such as streaming features, the continuous activities logs and temporary files can grow very quickly. Therefore, we might want to delete the logs and temporary files older than a certain number of days.

* From the **Add Step** dropdown, choose to add an **Execute macro** step.

* Name it `Clear continuous activities logs` then select **Clear continuous activities logs** as the macro.

* Set the **Max age (days)** to `9`.

* Select **Perform deletion**.

* Select **All projects**.

This tells DSS to delete all logs and temporary files of continuous activities older than nine days.

* Save your changes.

### Trigger the Scenario to Run as Admin[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#trigger-the-scenario-to-run-as-admin "Permalink to this headline")

Now that we have configured our scenario, let’s set it to run automatically by enabling the auto trigger. We’ll also tell DSS to run the scenario as **admin**. **Admin** is required for running maintenance macros on each project on the instance.

* Within the **Run** panel of the **Settings** tab, toggle the **Auto-triggers** to **ON**.

* Set the **Run as** option to **admin**.

* Leave the default settings.

* Save your changes.

You can now run the scenario.

## Maintenance Macro Descriptions[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#maintenance-macro-descriptions "Permalink to this headline")

This section describes the DSS maintenance macros.

### Backup Internal Databases[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#backup-internal-databases "Permalink to this headline")

This macro backs up the contents of the internal databases of DSS, so that they can be truncated.

Note

This macro is not working for externally hosted internal databases.

Warning

This macro will lock the databases while exporting their contents, potentially blocking usage of DSS. Usage of DSS could be blocked for seconds or minutes depending on the size of the databases.

When should I run this macro?

You can run this macro when you want to have a backup of your internal database, e.g., before externalizing your database.

### Clear App Instances[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#clear-app-instances "Permalink to this headline")

This macro deletes all instances of an app whose last activity is older than a certain number of days.

When should I run this macro?

You can run this macro when you want to free up some memory on your instance and maintain control of instances of a deployed app.

### Clear Continuous Activities Logs[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#clear-continuous-activities-logs "Permalink to this headline")

This macro deletes all logs and temporary files of continuous activities older than a certain number of days. The operation also deletes the runs from the user interface.

When should I run this macro?

Run this macro when you are working with continuous activities such as streaming features.

### Clear Deleted Experiments[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#clear-deleted-experiments "Permalink to this headline")

This macro permanently deletes Experiment Tracking experiments and runs in the “deleted” lifecycle stage. It also deletes all artifacts and metadata.

This macro is equivalent to the MLflow “gc” command.

When should I run this macro?

Run this macro when you want to permanently delete project items.

### Clear Internal Databases[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#clear-internal-databases "Permalink to this headline")

You can configure this macro to clear jobs histories, metrics, checks, scenario runs history, and activity timelines.

When should I run this macro?

You can run this macro when your internal database becomes too large and DSS is experiencing performance degradation.

### Clear Job Logs[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#clear-job-logs "Permalink to this headline")

This macro deletes all logs and temporary files of jobs older than a certain number of days.

The information about the job itself is not deleted. After running this macro, DSS still displays jobs in the jobs list, but all details about the old jobs will be removed.

When should I run this macro?

You can run this macro when you want to control disk space on your DSS server and control retention of jobs logs files.

### Clear Model Versions[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#clear-model-versions "Permalink to this headline")

This macro deletes the oldest inactive model versions for the project.

When should I run this macro?

You can run this macro when you want to free up disk space.

### Clear Scenario Run Logs[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#clear-scenario-run-logs "Permalink to this headline")

This macro deletes all logs and temporary files of scenario runs older than a certain number of days.

The information of the scenario run itself is not deleted. After running this macro, DSS displays the runs in the last runs tab of the scenario, but all details about the old runs will be removed.

When should I run this macro?

You can run this macro in order to retain control of your disk space on your DSS server and control retention of your scenario run logs files.

### Drop Pipeline Views[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#drop-pipeline-views "Permalink to this headline")

This macro drops any views on a connection that remain after executing one or more SQL pipeline jobs. DSS drops the view from all projects on the instance and not just the project where you run the macro.

When should I run this macro?

Leaving views behind at the end of a pipeline can cause problems if you later try to drop the table from which the view was derived. Run this macro to clean up any old pipeline views on a connection.

### List Datasets on Connection[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#list-datasets-on-connection "Permalink to this headline")

This macro lists all datasets using a given connection.

When should I run this macro?

You can run this macro when you want to list all datasets using a particular connection. This is useful when you have planned to remove a connection.

### Remove Old Container Images[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#remove-old-container-images "Permalink to this headline")

This macro tries to remove all outdated container images that are no longer in use. It will not remove images that have dependent child images or images that are being used by containers.

When should I run this macro?

You can run this macro when you want to retain control of the container images deployed by DSS because it removes old and unused container images built inside DSS.

### Remove Old Exports[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#remove-old-exports "Permalink to this headline")

This macro deletes exports such as datasets, notebook results, and script results that are older than a certain number of days.

When should I run this macro?

Run this macro to help control disk usage.

### Kill Jupyter Sessions[¶](https://knowledge.dataiku.com/latest/kb/setup-admin/best-practices/project-cleaning.html#kill-jupyter-sessions "Permalink to this headline")

This macro kills Jupyter sessions that have either been running for too long or have been idle for too long.

When should I run this macro?

You can run this macro when you want to free up some memory in your instance; this will delete your old and unused Jupyter sessions.
