# Hands-On Tutorial: Group the Data[¶](https://knowledge.dataiku.com/latest/courses/basics/group-data/group-the-data.html#hands-on-tutorial-group-the-data "Permalink to this headline")

Note

This lesson is a continuation of the Basics 102 Hands-On Tutorial: Statistics Worksheets and Cards.

At this point, we have a prepared t-shirts dataset and have done some preliminary statistical exploration.

If our ultimate goal is to understand our customers, we’ll need to group all past orders by unique customers, aggregating their past interactions.

Hint

A screencast at the end of the page recaps all of the actions described here.

## Group Orders by Customer[¶](https://knowledge.dataiku.com/latest/courses/basics/group-data/group-the-data.html#group-orders-by-customer "Permalink to this headline")

To do this, we’ll use another visual recipe, **Group**.

* With the *orders\_prepared* dataset open, look in the upper-right corner for the **Actions** menu. From this menu, choose **Group** in the list of Visual recipes.

* An alternative path is to select (but not open) the *orders\_prepared* dataset from the Flow and find the plus icon at the top of the right sidebar.

The Group Recipe allows you to aggregate the values of some columns by the values of one or more **keys**.

* In the recipe dialog, choose to group by *customer\_id*.

* Change the name of the output dataset to `orders\_by\_customer`.

* Select **Create Recipe**.

The Group recipe has several steps (on the left). The core step is the Group step, where you choose which columns to serve as keys and what aggregations you want performed.

Some columns, like *order\_id* and *tshirt\_category*, we won’t need in the new dataset. For the others, make the following selections:

* *order\_date*: **Min**

* *pages\_visited*: **Avg**

* *total*: **Sum**

For each customer, this will give us the date of first order, the average number of visited pages per visit, and the sum of all orders. We’ll also compute the count of each group – a default setting.

Note

The recipe reminds us of the storage type of each column in the “Per field aggregations”. We are able to retrieve the minimum of *order\_date* because its storage type is a date. If it were a string, the “minimum” would be the first result in alphabetical order.

Before running the recipe, check the **Output** step. Here we can rename the columns of the output dataset.

* Rename **order\_date\_min** to `first\_order\_date`.

Select **Run** to create the new grouped dataset.

When exploring the output dataset, use the Analyze tool on the *customer\_id* column. Note that all values are unique. We have exactly one record for every customer.

*The video below recaps these steps:*

## Learn More[¶](https://knowledge.dataiku.com/latest/courses/basics/group-data/group-the-data.html#learn-more "Permalink to this headline")

Now that you have a few datasets and recipes in the Flow, it’s time to take stock of what you’ve accomplished in the next hands-on tutorial Hands-On Tutorial: Explore the Flow.
