# How to Generate Features by Event[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/events-aggregator/by-event.html#how-to-generate-features-by-event "Permalink to this headline")

In this mode, for each event, all events before this event will be aggregated together. The output will contain one row per event.

For example, we may have a dataset recording sensor activity for different machines. Each row corresponds to an event at a specific timestamp. For any given time, we want to predict how long it will be until the machine will fail. So we need to group the input dataset by machine, and aggregate by event. The output dataset will then have one row per engine and timestamped event.

For reference, the final flow will look like the following. *Note:* in the flow below, we loaded the data into PostgreSQL tables, so the flow begins with PostgreSQL datasets, rather than Uploaded Files datasets.

## Preparing for Events Aggregation[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/events-aggregator/by-event.html#preparing-for-events-aggregation "Permalink to this headline")

The flow begins with four datasets: train observations, test observations, and their corresponding label datasets. Both the train and test observation datasets contain sensor measurements of the engines as well as operational settings conditions. The label datasets contain information on how long it will be before a given engine at a given timestamp will require maintenance.

These datasets are based on data simulated by NASA to better understand engine degradation under different operational conditions. We have modified the FD002 sets to include a *time* column. You can download the archive containing our modified files, extract them from the archive, and use them to create four new Uploaded Files datasets. For each dataset, go to the **Schema** tab and click **Infer types from data**.

Next, we can use Sync recipes to sync each of these datasets to SQL datasets, because the plugin works on SQL datasets. Now the *train\_obs* and *test\_obs* SQL datasets are ready for the Events Aggregator.

## Applying the Events Aggregator[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/events-aggregator/by-event.html#applying-the-events-aggregator "Permalink to this headline")

In the Flow, select **+Recipe > Feature factory: events-aggregator** to open the plugin recipe. Set *train\_obs* as the input dataset, create *train\_features* as the output dataset, and create the recipe.

We want to build rolling features for each engine, ending with a dataset that has one row per event per engine, so we define the groups by *engine\_no* and select **By event** as the level of aggregation. We also select *time* as the column that defines when the events occurred.

We’ll allow the recipe to automatically select the raw features, and we won’t define any temporal windows.

Repeat the process, using the *test\_obs* dataset as the input and creating *test\_features* as the output.

## Machine Learning[¶](https://knowledge.dataiku.com/latest/courses/advanced-data-prep/events-aggregator/by-event.html#machine-learning "Permalink to this headline")

The output datasets now have over 100 features. We join the observations datasets with the labeled datasets, and can now directly use the new features for predicting the label in the Visual Machine Learning interface. We deployed a Random Forest to the flow, and used it to score the test set.
