# Usage of Spark in DSS[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#usage-of-spark-in-dss "Permalink to this headline")

When Spark support is enabled in DSS, a large number of components feature additional options to run jobs on Spark.

## SparkSQL recipes[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#sparksql-recipes "Permalink to this headline")

SparkSQL recipes globally work like SQL Recipes but are not limited to SQL datasets. DSS will fetch the data and pass it on to Spark.

You can set the Spark configuration in the Advanced tab.

See SparkSQL recipes

## Visual recipes[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#visual-recipes "Permalink to this headline")

You can run Preparation and some Visual Recipes on Spark. To do so, select Spark as the execution engine and select the appropriate Spark configuration.

For each visual recipe that supports a Spark engine, you can select the engine under the “Run” button in the recipe’s main tab, and set the Spark configuration in the “Advanced” tab.

All visual data-transformation recipes support running on Spark, including:

* Prepare

* Sync

* Sample / Filter

* Group

* Disinct

* Join

* Pivot

* Sort

* Split

* Top N

* Window

* Stack

## Python code[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#python-code "Permalink to this headline")

You can write Spark code using Python:

* In a Pyspark recipe

* In a Python notebook

### Note about Spark code in Python notebooks[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#note-about-spark-code-in-python-notebooks "Permalink to this headline")

All Python notebooks use the same named Spark configuration. See Spark configurations for more information about named Spark configurations.

When you change the named Spark configuration used by notebooks, you need to restart DSS afterwards.

## R code[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#r-code "Permalink to this headline")

Warning

**Tier 2 support**: Support for SparkR and sparklyr is covered by Tier 2 support

You can write Spark code using R:

* In a Spark R recipe

* In a R notebook

Both the recipe and the notebook support two different APIs for accessing Spark:

* The “SparkR” API, ie. the native API bundled with Spark

* The “sparklyr” API

### Note about Spark code in R notebooks[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#note-about-spark-code-in-r-notebooks "Permalink to this headline")

All R notebooks use the same named Spark configuration. See Spark configurations for more information about named Spark configurations.

When you change the named Spark configuration used by notebooks, you need to restart DSS afterwards.

## Scala code[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#scala-code "Permalink to this headline")

You can write Spark code using Scala:

* In a Spark Scala recipe

* In a Scala notebook

### Spark Scala, PySpark & SparkR recipes[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#spark-scala-pyspark-sparkr-recipes "Permalink to this headline")

PySpark & SparkR recipe are like regular Python and R recipes, with the Spark libraries available. You can also use Scala, spark’s native language, to implement your custom logic. The Spark configuration is set in the recipe’s Advanced tab.

Interaction with DSS datasets is provided through a dedicated DSS Spark API, that makes it easy to read and write SparkSQL dataframes from datasets.

### Spark Scala, PySpark & SparkR notebooks[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#spark-scala-pyspark-sparkr-notebooks "Permalink to this headline")

The Jupyter notebook built-in with DSS has support for Spark in Python, R and Scala. See Code notebooks for more information.

Warning

The Spark-Scala notebook requires a separate installation of Spark to work when deployed on CDH.

Warning

The Spark-Scala notebook requires DSS to be run with Java 8.

### Note about Spark code in Scala notebooks[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#note-about-spark-code-in-scala-notebooks "Permalink to this headline")

All Scala notebooks use the same named Spark configuration. See Spark configurations for more information about named Spark configurations.

When you change the named Spark configuration used by notebooks, you need to restart DSS afterwards.

## Machine Learning with MLLib[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#machine-learning-with-mllib "Permalink to this headline")

See the dedicated MLLib page.

## Machine Learning with H2O Sparkling Water[¶](https://doc.dataiku.com/dss/latest/spark/usage.html#machine-learning-with-h2o-sparkling-water "Permalink to this headline")

See the dedicated H2O page.
