# How to Generate Features by Group[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/events-aggregator/by-group.html#how-to-generate-features-by-group "Permalink to this headline")

In this mode, features are aggregated by group and the output will have one row per group.

For example, we may have a dataset recording customer activity on an e-commerce website. Each row corresponds to an event at a specific timestamp. For a given fixed date, we want to predict who is most likely to churn. So we need to group the input dataset by user. The output dataset will then have one row per user.

For reference, the final flow will look like the following.

## Preparing for the Events Aggregator[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/events-aggregator/by-group.html#preparing-for-the-events-aggregator "Permalink to this headline")

The flow begins with two CSV source files. Download the archive containing these files, extract them from the archive, and use them to create two new Uploaded Files datasets.

While creating each dataset, go to the **Schema** tab and click **Infer types from data**.

Next, we need to Sync these datasets to SQL datasets, because the plugin works on SQL datasets. Now the *user\_activity* SQL dataset is ready for the Events Aggregator.

## Applying the Events Aggregator[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/events-aggregator/by-group.html#applying-the-events-aggregator "Permalink to this headline")

In the Flow, select **+Recipe > Feature factory: events-aggregator** to open the plugin recipe. Set *user\_activity* as the input dataset, create *user\_features* as the output dataset, and create the recipe.

We want to aggregate features for each customer, ending with a dataset that has one row per customer, so we define the groups by *user\_id* and select **By group** as the level of aggregation. We also select *event\_timestamp* as the column that defines when the events occurred.

We chose to allow input feature definition to be automatic. The chosen columns are used to generate the output aggregated features.

In order to capture the mid-short term trend, we chose to add a temporal window with a **window width in months** of 6, and checked the box to **add a window with all history**. This means that in addition to computing features on the entire event history, the recipe will generate features for events in the 6 months prior to the present.

If we wanted to generate features for each 6 month window going back for a year, we would change the setting of **Windows of each type** from 1 to 2.

## Machine Learning[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/events-aggregator/by-group.html#machine-learning "Permalink to this headline")

The output dataset now contains dozens of features. We join this dataset with the *user\_label* dataset, split the resulting dataset with a 70/30 random split into train and test sets, and can now directly use the new features for predicting the *target* column in the Visual Machine Learning interface. We deployed a Random Forest to the flow, and used it to score the test set.
