# Data preparation[¶](https://doc.dataiku.com/dss/latest/preparation/index.html#data-preparation "Permalink to this headline")

Visual data preparation in DSS lets you create data cleansing, normalization and enrichment scripts in a visual and interactive way.

You can create these scripts directly in a Prepare recipe, or in a Visual Analysis that can be deployed to the Flow as a Prepare recipe.

Note

For a step by step introduction to the data preparation component of Data Science Studio, we recommend that you follow our Basic Courses. This section will focus on *advanced* and *reference* topics related to the data preparation component.

We also have courses on advanced data preparation.

* How to Copy Prepare Recipe Steps

+ Copying Steps

* Sampling

* Execution engines

+ Design of the preparation

+ Execution in analysis

+ Execution of the recipe

+ Details on the in-database (SQL) engine

+ Details on the Spark engine

* Processors reference

+ Extract from array

+ Fold an array

+ Sort array

+ Concatenate JSON arrays

+ Discretize (bin) numerical values

+ Change coordinates system

+ Copy column

+ Rename columns

+ Concatenate columns

+ Delete/Keep columns by name

+ Column Pseudonymization

+ Count occurrences

+ Convert currencies

+ Create if, then, else statements

+ Extract date elements

+ Compute difference between dates

+ Format date with custom format

+ Parse to standard date format

+ Split e-mail addresses

+ Enrich from French department

+ Enrich from French postcode

+ Enrich with build context

+ Enrich with record context

+ Extract ngrams

+ Extract numbers

+ Fill column

+ Fill empty cells with fixed value

+ Filter rows/cells on date

+ Filter rows/cells with formula

+ Filter invalid rows/cells

+ Filter rows/cells on numerical range

+ Filter rows/cells on value

+ Find and replace

+ Flag rows/cells on date range

+ Flag rows with formula

+ Flag invalid rows

+ Flag rows on numerical range

+ Flag rows on value

+ Fold multiple columns

+ Fold multiple columns by pattern

+ Fold object keys

+ Formula

+ Fuzzy join with other dataset (memory-based)

+ Generate Big Data

+ Compute distance between geopoints

+ Extract from geo column

+ Geo-join

+ Resolve GeoIP

+ Create area around a geopoint

+ Create GeoPoint from lat/lon

+ Extract lat/lon from GeoPoint

+ Extract with grok

+ Flag holidays

+ Split invalid cells into another column

+ Join with other dataset (memory-based)

+ Extract with JSONPath

+ Group long-tail values

+ Compute the average of numerical values

+ Translate values using meaning

+ Normalize measure

+ Merge long-tail values

+ Move columns

+ Negate boolean value

+ Force numerical range

+ Generate numerical combinations

+ Convert number formats

+ Nest columns

+ Unnest object (flatten JSON)

+ Extract with regular expression

+ Pivot

+ Python function

+ Split HTTP Query String

+ Remove rows where cell is empty

+ Round numbers

+ Simplify text

+ Split and fold

+ Split and unfold

+ Split column

+ Switch case

+ Transform string

+ Tokenize text

+ Transpose rows to columns

+ Triggered unfold

+ Unfold

+ Unfold an array

+ Convert a UNIX timestamp to a date

+ Fill empty cells with previous/next value

+ Split URL (into protocol, host, port, …)

+ Classify User-Agent

+ Generate a best-effort visitor id

+ Zip JSON arrays

* Filtering and flagging rows

+ Common filtering actions

+ Columns selection

+ Filter on value

+ Filter on numerical range

+ Filter on date range

+ Filter on formula

+ Filter on bad meaning

* Managing dates

+ Working with dates in Preparation

+ Parsing Dates

+ Using dates

* Reshaping

+ Split and Fold

+ Fold multiple columns

+ Fold multiple columns by pattern

+ Unfold

+ Unfold an array

+ Split and Unfold

+ Triggered Unfold

* Geographic processors

+ Geopoint converters

+ Resolve GeoIP

+ Reverse geocoding

+ Zipcode geocoding

+ Change coordinates system

+ Compute distances between points

+ Create area around a geopoint

+ Extract from geo column
