# Dataiku DSS[¶](https://doc.dataiku.com/dss/latest/index.html#dataiku-dss "Permalink to this headline")

Welcome to the Product Documentation for Dataiku Data Science Studio (DSS). This site contains information on the details of installing and configuring Dataiku DSS in your environment, using the tool through the browser interface, and driving it through the API.

## Is This the Help You’re Looking For?[¶](https://doc.dataiku.com/dss/latest/index.html#is-this-the-help-you-re-looking-for "Permalink to this headline")

You might also find these other resources useful:

* The Knowledge Base a variety of topics that can help you to learn more about Dataiku DSS, or find solutions to problems without having to ask for help.

* Dataiku Academy provides guided learning paths for you to follow, upskill, and gain certification on Dataiku DSS.

* Dataiku Community is a place where you can join the discussion, get support, share best practices and engage with other Dataiku users.

## Reference Doc Contents[¶](https://doc.dataiku.com/dss/latest/index.html#reference-doc-contents "Permalink to this headline")

* DSS concepts

+ Homepage

+ Projects

+ Data

+ Datasets

+ Recipes

+ Building datasets

+ Managed and external datasets

+ Partitioning

* Connecting to data

+ Supported connections

+ SQL databases

+ Amazon S3

+ Azure Blob Storage

+ Google Cloud Storage

+ Upload your files

+ HDFS

+ Cassandra

+ MongoDB

+ Elasticsearch

+ File formats

+ Managed folders

+ “Files in folder” dataset

+ Metrics dataset

+ Internal stats dataset

+ “Editable” dataset

+ kdb+

+ FTP

+ SCP / SFTP (aka SSH)

+ HTTP

+ HTTP (with cache)

+ Server filesystem

+ Dataset plugins

+ Making relocatable managed datasets

+ Clearing non-managed Datasets

+ Data ordering

+ PI System / PIWebAPI server

* Exploring your data

+ Sampling

+ Analyze

* Schemas, storage types and meanings

+ Definitions

+ Basic usage

+ Schema for data preparation

+ Creating schemas of datasets

+ Handling of schemas by recipes

+ List of recognized meanings

+ User-defined meanings

+ Handling and display of dates

* Data preparation

+ How to Copy Prepare Recipe Steps

+ Sampling

+ Execution engines

+ Processors reference

+ Filtering and flagging rows

+ Managing dates

+ Reshaping

+ Geographic processors

* Charts

+ The Charts Interface

+ Sampling & Engine

+ Basic Charts

+ Tables

+ Scatter Charts

+ Map Charts

+ Other Charts

+ Common Chart Elements

+ Color palettes

+ Formatting

+ Filter settings

* Interactive statistics

+ The Worksheet Interface

+ Univariate Analysis

+ Bivariate Analysis

+ Fit curves and distributions

+ Statistical Tests

+ Multivariate Analysis

+ Time Series Analysis

+ Assisted Data Exploration

* Machine learning

+ Prediction (Supervised ML)

+ Clustering (Unsupervised ML)

+ Automated machine learning

+ Model Settings Reusability

+ Features handling

+ Algorithms reference

+ Advanced models optimization

+ Models ensembling

+ Model Document Generator

+ Time Series Forecasting

+ Deep Learning

+ Models lifecycle

+ Scoring engines

+ Writing custom models

+ Exporting models

+ Partitioned Models

+ ML Diagnostics

+ ML Assertions

+ Computer vision

+ Image labeling

* The Flow

+ Visual Grammar

+ Flow zones

+ Rebuilding Datasets

+ Limiting Concurrent Executions

+ Exporting the Flow to PDF or images

+ How to Manage Large Flows with Flow Folding

+ Flow Document Generator

* Visual recipes

+ Prepare: Cleanse, Normalize, and Enrich

+ Sync: copying datasets

+ Grouping: aggregating data

+ Window: analytics functions

+ Distinct: get unique rows

+ Join: joining datasets

+ Fuzzy join: joining two datasets

+ Geo join: joining datasets based on geospatial features

+ Splitting datasets

+ Top N: retrieve first N rows

+ Stacking datasets

+ Sampling datasets

+ Sort: order values

+ Pivot recipe

+ Push to editable recipe

+ Download recipe

+ List Folder Contents

* Recipes based on code

+ The common editor layout

+ Python recipes

+ R recipes

+ SQL recipes

+ Hive recipes

+ Impala

+ Spark-Scala recipes

+ PySpark recipes

+ Spark / R recipes

+ SparkSQL recipes

+ Shell recipes

+ Variables expansion in code recipes

* Code notebooks

+ SQL notebook

+ Python notebooks

+ Predefined notebooks

+ Containerized notebooks

+ Installing Jupyter Extensions

* MLOps

+ Feature Store

+ Models evaluations

+ Model Comparisons

+ Drift analysis

+ MLflow Models

+ Experiment Tracking

* Webapps

+ “Standard” web apps

+ Shiny web apps

+ Bokeh web apps

+ Dash web apps

+ Publishing webapps on the dashboard

+ Public webapps

+ Webapps and security

+ Scaling webapps on Kubernetes

+ Introduction to DSS webapps

+ Example use cases

* Code Studios

+ Concepts

+ Preparing Code Studio templates

+ Running Code Studios

+ Publish a Code Studio as a webapp

* Code reports

+ R Markdown reports

* Dashboards

+ Dashboard concepts

+ Display settings

+ Exporting dashboards to PDF or images

+ Filters

+ Filtering a dashboard using a query parameter in the URL

+ Insights reference

* Workspaces

+ Sharing DSS objects into a workspace

+ Managing Workspaces

+ Discussions

* Dataiku Applications

+ Application tiles

+ Application-as-recipe

+ Introduction

+ Using a Dataiku application

+ Developing a Dataiku application

+ Application-as-recipe

+ Sharing a Dataiku application

+ Initiating an application instantiation request

+ Managing an application execution request

* Working with partitions

+ Partitioning files-based datasets

+ Partitioned SQL datasets

+ Specifying partition dependencies

+ Partition identifiers

+ Recipes for partitioned datasets

+ Partitioned Hive recipes

+ Partitioned SQL recipes

+ Partitioning variables substitutions

+ Partitioned Models

+ The two partitioning models

* DSS and SQL

+ SQL datasets

+ SQL write and execution

+ Partitioning

+ SQL pipelines in DSS

* DSS and Python

+ Installing Python packages

+ Reusing Python code

+ Using Matplotlib

+ Using SpaCy

+ Using Bokeh

+ Using Plot.ly

+ Using Ggplot

+ Using Jupyter Widgets

* DSS and R

+ Installing R packages

+ Reusing R code

+ Using ggplot2

+ Using Dygraphs

+ Using googleVis

+ Using ggvis

+ Installing STAN or Prophet

+ RStudio integration

* DSS and Spark

+ Usage of Spark in DSS

+ Spark configurations

+ Interacting with DSS datasets

+ Spark pipelines

+ Limitations and attention points

+ Setting up Spark integration

* Code environments

+ Operations (Python)

+ Operations (R)

+ Base packages

+ Using Conda

+ Automation nodes

+ Non-managed code environments

+ Plugins’ code environments

+ Custom options and environment

+ Troubleshooting

+ Code env permissions

* Collaboration

+ Wikis

+ Discussions

+ Markdown

+ Tags

+ Working with Git

+ Version control of projects

+ Importing code from Git in project libraries

+ Importing Jupyter Notebooks from Git

+ Requests

* Time Series

+ Understanding time series data

+ Format of time series data

+ Time series preparation

+ Time series visualization

+ Time series forecasting

* Geographic data

+ Geographic data types

+ Geographic data

+ Visualizing geographic data

+ Geographic data preparation

+ Geo join

+ Geocoding and reverse geocoding

+ Georouting and isochrones

+ Geographic formula functions

* Text

* Images

* Audio

* Video

* Automation scenarios, metrics, and checks

+ Definitions

+ Scenario steps

+ Launching a scenario

+ Reporting on scenario runs

+ Custom scenarios

+ Variables in scenarios

+ Step-based execution control

+ Metrics

+ Checks

+ Custom probes and checks

* Production deployments and bundles

+ Setting up the Deployer

+ Creating a bundle

+ Deployment infrastructures

+ Deploying bundles with the Project Deployer

+ Manually importing bundles

* API Node & API Deployer: Real-time APIs

+ Introduction

+ Concepts

+ Installing API nodes

+ Setting up the API Deployer and deployment infrastructures

+ First API (with API Deployer)

+ First API (without API Deployer)

+ Types of Endpoints

+ Enriching prediction queries

+ Security

+ Managing versions of your endpoint

+ Deploying on Kubernetes

+ APINode APIs reference

+ Operations reference

* Governance

+ Definitions

+ Installing and setting up Govern

+ Govern Projects, Models and Bundles

+ Sign-off Scenario

+ Model Registry

+ Bundle Registry

+ Blueprint Designer

+ Public REST API

* Python APIs

+ Using the APIs inside of DSS

+ Using the APIs outside of DSS

+ Datasets (introduction)

+ Datasets (reading and writing data)

+ Datasets (other operations)

+ Datasets (reference)

+ Feature Store

+ Managed folders

+ Streaming Endpoints

+ Interaction with Pyspark

+ The main DSSClient class

+ Projects

+ Project folders

+ Project libraries

+ Recipes

+ Interaction with saved models

+ Scenarios

+ Scenarios (in a scenario)

+ Flow creation and management

+ Machine learning

+ Experiment Tracking

+ Statistics worksheets

+ Code studios

+ API Designer & Deployer

+ Project Deployer

+ Static insights

+ Jobs

+ Authentication information and impersonation

+ Importing tables as datasets

+ Wikis

+ Discussions

+ Performing SQL, Hive and Impala queries

+ SQL Query

+ Meanings

+ Users and groups

+ Connections

+ Code envs

+ Plugins

+ Macros

+ Dataiku applications

+ Metrics and checks

+ Model Evaluation Stores

+ Other administration tasks

+ Utilities

+ Reference API documentation of `dataiku`

+ Reference API documentation of `dataikuapi`

+ API for plugin components

+ Clusters

+ Code studios

+ API for Fleet Manager

+ API for Dataiku Govern

+ Workspaces

+ Webapps

* R API

+ Using the R API inside of DSS

+ Using the R API outside of DSS

+ Reference documentation

+ Authentication information

+ Creating static insights

* Public REST API

+ Features

+ Public API Keys

+ The REST API

* Additional APIs

+ The Javascript API

+ The Scala API

* Installing and setting up

+ Dataiku Cloud Stacks for AWS

+ Dataiku Cloud Stacks for Azure

+ Dataiku Cloud Stacks for GCP

+ Custom Dataiku install on Linux

+ Other installation options

* Elastic AI computation

+ Concepts

+ Initial setup

+ Managed Kubernetes clusters

+ Using Amazon Elastic Kubernetes Service (EKS)

+ Using Microsoft Azure Kubernetes Service (AKS)

+ Using Google Kubernetes Engine (GKE)

+ Using code envs with containerized execution

+ Dynamic namespace management

+ Customization of base images

+ Unmanaged Kubernetes clusters

+ Using Openshift

+ Troubleshooting

+ Using Docker instead of Kubernetes

* DSS in the cloud

+ DSS in AWS

+ DSS in Azure

+ DSS in GCP

* DSS and Hadoop

+ Setting up Hadoop integration

+ Connecting to secure clusters

+ Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS)

+ Hive

+ Impala

+ Spark

+ Hive datasets

+ Hadoop user isolation

+ Distribution-specific notes

+ Teradata Connector For Hadoop

+ Multiple Hadoop clusters

+ Dynamic AWS EMR clusters

+ Dynamic Google Dataproc clusters

* Metastore catalog

* Operating DSS

+ dsscli tool

+ The data directory

+ Backing up

+ Audit trail

+ The runtime databases

+ Logging in DSS

+ DSS Macros

+ Managing DSS disk usage

+ Understanding and tracking DSS processes

+ Tuning and controlling memory usage

+ Using cgroups for resource control

+ Monitoring DSS

+ HTTP proxies

+ DSS license

+ Compute resource usage reporting

* Security

+ Project Access

+ Main project permissions

+ Connections security

+ User profiles

+ Shared objects

+ Workspaces & dashboards authorizations

+ User secrets

+ Audit Trail

+ Govern Security: Roles and Permissions

+ Configuring LDAP authentication

+ Single Sign-On

+ Multi-Factor Authentication

+ Passwords security

+ Advanced security options

* User Isolation

+ Capabilities of User Isolation Framework

+ Concepts

+ Prerequisites and limitations

+ Initial Setup

+ Troubleshooting

+ Reference architectures

+ Details of UIF capabilities

+ Advanced topics

* Plugins

+ Installing plugins

+ Managing installed plugins

+ Developing plugins

* Streaming data

+ Concepts

+ Kafka

+ AWS SQS

+ HTTP Server-Sent Events

+ Continuous sync

+ Continuous Python

+ Streaming Spark Scala

* Formula language

+ Basic usage

+ Reading column values

+ Variables typing and autotyping

+ Boolean values

+ Operators

+ Array and object operations

+ Object notations

+ DSS variables

+ Array functions

+ Boolean functions

+ Date functions

+ Math functions

+ Object functions

+ String functions

+ Geometry functions

+ Value access functions

+ Control structures

+ Tests

* Custom variables expansion

+ Defining variables

+ Using variables in the code of a recipe

+ Using variables in configuration fields

+ Using override tables

+ Modifying the value of variables

* Sampling methods

+ Generic sampling methods

+ Exploration / Visual data preparation

* Accessibility

+ Global Shortcuts

+ Project Navigation

+ Within the Flow

+ Within a Dataset

+ Within a Prepare Recipe

+ Within a Code Recipe

+ Within any Recipe

+ Within any Code Editor (Excluding Notebooks)

+ Within any Flow Object

+ Within Plugins Development

+ Within a Dataset Insight

* Troubleshooting

+ Diagnosing and debugging issues

+ Obtaining support

+ Support tiers

+ Common issues

+ Error codes

* Release notes

+ DSS 11 Release notes

+ DSS 10.0 Release notes

+ DSS 9.0 Release notes

+ DSS 8.0 Release notes

+ DSS 7.0 Release notes

+ DSS 6.0 Release notes

+ DSS 5.1 Release notes

+ DSS 5.0 Release notes

+ DSS 4.3 Release notes

+ DSS 4.2 Release notes

+ DSS 4.1 Release notes

+ DSS 4.0 Release notes

+ DSS 3.1 Release notes

+ DSS 3.0 Relase notes

+ DSS 2.3 Relase notes

+ DSS 2.2 Relase notes

+ DSS 2.1 Relase notes

+ DSS 2.0 Relase notes

+ DSS 1.4 Relase notes

+ DSS 1.3 Relase notes

+ DSS 1.2 Relase notes

+ DSS 1.1 Release notes

+ DSS 1.0 Release Notes

+ Pre versions

* Other Documentation

+ Older DSS versions

+ Other Dataiku products

* Third-party acknowledgements

+ DSS

+ Mac version only

+ Dataiku Online
