# Component: Preparation Processor[¶](https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html#component-preparation-processor "Permalink to this headline")

Preparation processors are additional steps you can add to a Prepare recipe Script.

To create a new Preparation processor, we recommend that you use the plugin developer tools (see the tutorial for an introduction). In the Definition tab, click on “+ ADD COMPONENT”, choose ”PREPARATION PROCESSOR”, and enter the identifier for your new processor. You’ll see a new folder named after your identifier containing 2 files: `processor.json` and `processor.py`.

A basic processor definition looks like:

§ {

§ "meta" : {

§ "label" : "Custom processor",

§ "description" : "",

§ "icon" : "icon-puzzle-piece"

§ },

§ "mode": "CELL",

§ "params" : [

§ {

§ "name": "param1",

§ "label": "Parameter 1",

§ "type": "STRING",

§ "description": "Some documentation for parameter1",

§ "mandatory": true

§ }

§ ]

§ }

A basic implementation looks like:

§ def process(row):

§ # row is a dict of the row on which the step is applied

§ param1\_value = params.get('param1')

§ return param1\_value

The “meta” field is similar to all other kinds of DSS components.

## Output single column[¶](https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html#output-single-column "Permalink to this headline")

When `mode` is set to `CELL` in the descriptor, the preparation processor outputs a single column.

To generate the values for this output column, DSS calls the `process` function of the processor for each rows of the dataset, and stores the returned value in the output column for the associated row.

The following implementation creates a new column containing a salutation message using the `Name` column in the input dataset.

§ def process(row):

§ return "Dear " + row['Name']

To allow end-users to select an input column, you add a parameter of type COLUMN.

§ {

§ "meta" : {

§ "label" : "Custom processor (cell)",

§ "description" : "",

§ "icon" : "icon-puzzle-piece"

§ },

§ "mode": "CELL",

§ "params" : [

§ {

§ "name": "input\_column",

§ "label": "Input column",

§ "type": "COLUMN",

§ "description": "Column containing the name of the person",

§ "columnRole": "main",

§ "mandatory": true

§ }

§ ]

§ }

§ def process(row):

§ input\_column = params.get('input\_column')

§ return "Dear " + row[input\_column]

## Output multiple columns[¶](https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html#output-multiple-columns "Permalink to this headline")

To output or modify more than one column, `mode` must be set to `ROW` in the descriptor.

The implementation of your `process` function must return a `dict` where each key/value represent a cell in the row. You usually will return the `row` object received as argument after your modifications.

To configure the names of the output columns, you add one parameter per output column.

§ {

§ "meta" : {

§ "label" : "Custom processor (row)",

§ "description" : "",

§ "icon" : "icon-puzzle-piece"

§ },

§ "mode": "ROW",

§ "params" : [

§ {

§ "name": "input\_column",

§ "label": "Input column",

§ "type": "COLUMN",

§ "description": "Column containing the name of the person",

§ "columnRole": "main",

§ "mandatory": true

§ },

§ {

§ "name": "salutation\_column",

§ "label": "Salutation column",

§ "type": "COLUMN",

§ "description": "Output for salutation message",

§ "columnRole": "output\_salutation"

§ },

§ {

§ "name": "greeting\_column",

§ "label": "Greeting column",

§ "type": "COLUMN",

§ "description": "Output column for greeting message",

§ "columnRole": "output\_greeting"

§ }

§ ]

§ }

For example, to generate 2 additional columns containing a salutation and a greeting message for each person in the dataset, you would use the above descriptor and this implementation:

§ def process(row):

§ input\_column = params.get('input\_column')

§ salutation\_column = params.get('salutation\_column')

§ if salutation\_column is not None and salutation\_column != "":

§ row[salutation\_column] = "Dear " + row[input\_column]

§ greeting\_column = params.get('greeting\_column')

§ if greeting\_column is not None and greeting\_column != "":

§ row[greeting\_column] = "Hello " + row[input\_column]

§ return row

## Using code environment for a processor[¶](https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html#using-code-environment-for-a-processor "Permalink to this headline")

To use the code environment defined for the processor, add an additional parameter “useKernel” within the `processor.json`.

The updated processor definition looks like:

§ {

§ "meta" : {

§ "label" : "Custom processor",

§ "description" : "",

§ "icon" : "icon-puzzle-piece"

§ },

§ "mode": "CELL",

§ "params" : [

§ {

§ "name": "param1",

§ "label": "Parameter 1",

§ "type": "STRING",

§ "description": "Some documentation for parameter1",

§ "mandatory": true

§ }

§ ],

§ "useKernel" : true

§ }

The plugin will need to be reloaded after making the above change.
