# Hive datasets[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#hive-datasets "Permalink to this headline")

* Use cases

+ Hive views

+ No read access on source files

+ ACID tables (ORC)

+ DATE and DECIMAL data types

* Creating a Hive dataset

+ New dataset

+ Import

* Using a Hive dataset

+ Hive recipes

+ Visual recipes with Hive as execution engine

+ Spark recipes

+ Visual recipes with Spark as execution engine

+ Limitations

Most of the time, to read and write data in the Hadoop ecosystem, DSS handles HDFS datasets, that is file-oriented datasets pointing to files residing on one or several HDFS-like filesystems.

DSS can also handle Hive datasets. Hive datasets are pointers to Hive tables already defined in the Hive metastore.

* Hive datasets can only be used for reading, not for writing

* To read data from Hive datasets, DSS uses HiveServer2 (using a JDBC connection). In essence a Hive dataset is a SQL-like dataset

## Use cases[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#use-cases "Permalink to this headline")

HDFS dataset remains the “to-go” dataset for interacting with Hadoop-hosted data. The HDFS dataset provides the most features, the most ability to paralellize work and execute it on the cluster.

However, there are some cases of (existing) source data for which the HDFS isn’t able to read them properly. In that case, using a Hive dataset as the source of your Flow will allow you to read your data. Since Hive dataset is read-only, only the sources of the Flow use a Hive dataset, and subsequent parts of the Flow revert to regular HDFS datasets.

### Hive views[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#hive-views "Permalink to this headline")

If you have existing data which is available through a Hive view, there are no HDFS files materializing this particular data. In that case, you cannot use a HDFS dataset and should use a Hive dataset.

### No read access on source files[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#no-read-access-on-source-files "Permalink to this headline")

Through the Hive security mechanisms (Sentry and Ranger), it is possible to have existing tables in the Hive metastore, with read access to these tables using HiveServer2, but not read access to the underlying HDFS files.

In that case, you cannot use a HDFS dataset and should use a Hive dataset.

### ACID tables (ORC)[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#acid-tables-orc "Permalink to this headline")

You can create ACID tables in Hive (in the ORC format). These tables support UPDATE statements that regular Hive tables don’t support. These tables are stored in a very specific format that only HiveServer2 can read. DSS cannot properly read the underlying files of these tables.

In that case, you cannot use a HDFS dataset and should use a Hive dataset.

### DATE and DECIMAL data types[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#date-and-decimal-data-types "Permalink to this headline")

There are various difficulties in reading tables containing these kind of columns. It is recommended to use Hive datasets preferably when reading these tables.

## Creating a Hive dataset[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#creating-a-hive-dataset "Permalink to this headline")

You do not need to setup a connection to create a Hive dataset. As soon as connectivity with Hadoop (and your HiveServer2) is established, you can create Hive datasets

### New dataset[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#new-dataset "Permalink to this headline")

* Select New Dataset > Hive

* Select the database and the table

* Click on test to retrieve the schema

* Your Hive dataset is ready to use

### Import[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#import "Permalink to this headline")

Either from the catalog or connections explorer, when selecting an existing Hive table, you will have the option to import it either as a HDFS dataset or Hive dataset.

## Using a Hive dataset[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#using-a-hive-dataset "Permalink to this headline")

A Hive dataset can be used with most kinds of DSS recipes.

### Hive recipes[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#hive-recipes "Permalink to this headline")

You can create Hive recipes with Hive datasets as inputs.

Note

The recipe MUST be in “Hive CLI (global metastore)” or HiveServer2 mode for this to work. Please see Hive for more information.

### Visual recipes with Hive as execution engine[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#visual-recipes-with-hive-as-execution-engine "Permalink to this headline")

You can create visual recipes and select the Hive execution engine (when available) with Hive datasets as inputs.

Note

The recipe MUST be in “Hive CLI (global metastore)” or HiveServer2 mode for this to work. Please see Hive for more information.

### Spark recipes[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#spark-recipes "Permalink to this headline")

You can create Spark (code) recipes with Hive datasets as inputs.

Note

The recipe MUST be in “Use global metastore)” mode for this to work.

Note

You must have filesystem-level access to the underlying files of this Hive table for this to work.

### Visual recipes with Spark as execution engine[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#visual-recipes-with-spark-as-execution-engine "Permalink to this headline")

You can create visual recipes and select the Spark execution engine (when available) recipes with Hive datasets as inputs.

Note

The recipe MUST be in “Use global metastore)” mode for this to work.

Note

You must have filesystem-level access to the underlying files of this Hive table for this to work.

### Limitations[¶](https://doc.dataiku.com/dss/latest/hadoop/hive-dataset.html#limitations "Permalink to this headline")

* SQL recipes cannot be used. Use a Hive recipe instead

* Spark engine (and Spark recipes) cannot be used if you don’t have filesystem access to the underlying tables.
