# Hands-On: Explore the Interactive Statistics Interface[¶](https://knowledge.dataiku.com/latest/courses/statistics/interface/explore-interface.html#hands-on-explore-the-interactive-statistics-interface "Permalink to this headline")

Dataiku provides the ability to perform exploratory data analysis (EDA) through the **Statistics** tab of a dataset. Using this feature, you can implement and visualize the following tasks:

* **Descriptive statistics**: Univariate and bivariate analysis, curve and distribution fitting, and correlation computation.

* **Inferential statistics**: Hypothesis testing.

* **Dimensionality reduction**: Principal Component Analysis (PCA).

This course walks you through how to perform EDA tasks on the wine quality dataset [1] that is available in the UCI Machine Learning Repository. The original dataset consists of 12 features (or variables), and in this tutorial, we create an additional column for a variable *Type* to indicate whether an observation belongs to the red wine or white wine category. For the purpose of this course, the *type* and *quality* variables in the dataset are treated as categorical variables, while all other variables are numerical.

## Prerequisites[¶](https://knowledge.dataiku.com/latest/courses/statistics/interface/explore-interface.html#prerequisites "Permalink to this headline")

This tutorial assumes that you have access to an instance of Dataiku version 8.0 or above.

## Create Your Project[¶](https://knowledge.dataiku.com/latest/courses/statistics/interface/explore-interface.html#create-your-project "Permalink to this headline")

The first step is to create a new Dataiku **Project**.

* From the Dataiku homepage, click **+New Project > DSS Tutorials > ML Practitioner > Interactive Visual Statistics (Tutorial)**.

Note

You can also download the starter project from this website and import it as a zip file.

Notice that the project contains input datasets that have been stacked to create the *winequality* dataset. The following figure shows a snippet of the *winequality* dataset, with the red box highlighting the storage type for one of the numerical columns.

This dataset has been prepared as follows:

* A new column, *type*, was created to indicate the data source (*white* or *red*).

* The storage type for the numerical columns has been changed from “string” to “double”. This is so that Dataiku can treat these columns as numerical variables instead of categorical variables.

## The Statistics Interface[¶](https://knowledge.dataiku.com/latest/courses/statistics/interface/explore-interface.html#the-statistics-interface "Permalink to this headline")

The **Statistics** tab of a dataset allows you to generate statistical reports on your data by creating **worksheets**.

* Go to the **Flow**.

* Open the *winequality* dataset to explore it.

* Navigate to the **Statistics** tab of the dataset, and click **+Create Your First Worksheet**.

This brings up a window that contains a selection of card types.

Here, we have the option to select **Automatically suggest analyses**. This runs a smart assistant to help us discover patterns in the data by suggesting analyses on variables of interest. This is particularly useful when there are many columns in the dataset or when you need some notion of where to begin your analysis. However, you can still manually select the statistical report you wish to generate.

### Reference[¶](https://knowledge.dataiku.com/latest/courses/statistics/interface/explore-interface.html#reference "Permalink to this headline")

[1] Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. Modeling wine preferences by data mining from physicochemical properties. 2009.
