# Flow creation and management[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#flow-creation-and-management "Permalink to this headline")

* Programmatically building a Flow

+ Creating a Python recipe

+ Creating a Sync recipe

+ Creating and modifying a grouping recipe

+ A complete example

* Working with flow zones

+ Creating a zone and adding items in it

+ Listing and getting zones

+ Changing the settings of a zone

+ Getting the zone of a dataset

* Navigating the flow graph

+ Finding sources of the Flow

+ Enumerating the graph in order

+ Replacing an input everywhere in the graph

* Schema propagation

* Exporting a flow documentation

* Reference documentation

## Programmatically building a Flow[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#programmatically-building-a-flow "Permalink to this headline")

The flow, including datasets, recipes, … can be fully managed and created programmatically.

Datasets can be created and managed using the methods detailed in Datasets (other operations).

Recipes can be created using the project.new\_recipe method. This follows a builder pattern: new\_recipe returns you a recipe creator object, on which you add settings, and then call the create() method to actually create the recipe object.

The builder objects reproduce the functionality available in the recipe creation modals in the UI, so for more control on the recipe’s setup, it is often necessary to get its settings after creation, modify it, and save it again.

### Creating a Python recipe[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#creating-a-python-recipe "Permalink to this headline")

§ builder = project.new\_recipe("python")

§ # Set the input

§ builder.with\_input("myinputdataset")

§ # Create a new managed dataset for the output in the filesystem\_managed connection

§ builder.with\_new\_output\_dataset("grouped\_dataset", "filesystem\_managed")

§ # Set the code - builder is a PythonRecipeCreator, and has a ``with\_script`` method

§ builder.with\_script("""

§ import dataiku

§ from dataiku import recipe

§ input\_dataset = recipe.get\_inputs\_as\_datasets()[0]

§ output\_dataset = recipe.get\_outputs\_as\_datasets()[0]

§ df = input\_dataset.get\_dataframe()

§ df = df.groupby("something").count()

§ output\_dataset.write\_with\_schema(df)

§ """)

§ recipe = builder.create()

§ # recipe is now a ``DSSRecipe`` representing the new recipe, and we can now run it

§ job = recipe.run()

### Creating a Sync recipe[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#creating-a-sync-recipe "Permalink to this headline")

§ builder = project.new\_recipe("sync")

§ builder = builder.with\_input("input\_dataset\_name")

§ builder = builder.with\_new\_output("output\_dataset\_name", "hdfs\_managed", format\_option\_id="PARQUET\_HIVE")

§ recipe = builder.create()

§ job = recipe.run()

### Creating and modifying a grouping recipe[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#creating-and-modifying-a-grouping-recipe "Permalink to this headline")

The recipe creation mostly handles setting up the inputs and outputs of the recipes, so most of the setup of the recipe has to be done by retrieving its settings, altering and saving them, then applying schema changes to the output

§ builder = project.new\_recipe("grouping")

§ builder.with\_input("dataset\_to\_group\_on")

§ # Create a new managed dataset for the output in the "filesystem\_managed" connection

§ builder.with\_new\_output("grouped\_dataset", "filesystem\_managed")

§ builder.with\_group\_key("column")

§ recipe = builder.build()

§ # After the recipe is created, you can edit its settings

§ recipe\_settings = recipe.get\_settings()

§ recipe\_settings.set\_column\_aggregations("myvaluecolumn", sum=True)

§ recipe\_settings.save()

§ # And you may need to apply new schemas to the outputs

§ # This will add the myvaluecolumn\_sum to the "grouped\_dataset" dataset

§ recipe.compute\_schema\_updates().apply()

§ # It should be noted that running a recipe is equivalent to building its output(s)

§ job = recipe.run()

### A complete example[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#a-complete-example "Permalink to this headline")

This examples shows a complete chain:

* Creating an external dataset

* Automatically detecting the settings of the dataset (see Datasets (other operations) for details)

* Creating a prepare recipe to cleanup the dataset

* Then chaining a grouping recipe, setting an aggregation on it

* Running the entire chain

§ dataset = project.create\_sql\_table\_dataset("mydataset", "PostgreSQL", "my\_sql\_connection", "mytable", "myschema")

§ dataset\_settings = dataset.autodetect\_settings()

§ dataset\_settings.save()

§ # As a shortcut, we can call new\_recipe on the DSSDataset object. This way, we don't need to call "with\_input"

§ prepare\_builder = dataset.new\_recipe("prepare")

§ prepare\_builder.with\_new\_output("mydataset\_cleaned", "filesystem\_managed")

§ prepare\_recipe = prepare\_builder.create()

§ # Add a step to clean values in "doublecol" that are not valid doubles

§ prepare\_settings = prepare\_recipe.get\_settings()

§ prepare\_settings.add\_filter\_on\_bad\_meaning("DoubleMeaning", "doublecol")

§ prepare\_settings.save()

§ prepare\_recipe.compute\_schema\_updates().apply()

§ prepare\_recipe().run()

§ # Grouping recipe

§ grouping\_builder = project.new\_recipe("grouping")

§ grouping\_builder.with\_input("mydataset\_cleaned")

§ grouping\_builder.with\_new\_output("mydataset\_cleaned\_grouped", "filesystem\_managed")

§ grouping\_builder.with\_group\_key("column")

§ grouping\_recipe = grouping\_builder.build()

§ grouping\_recipe\_settings = grouping\_recipe.get\_settings()

§ grouping\_recipe\_settings.set\_column\_aggregations("myvaluecolumn", sum=True)

§ grouping\_recipe\_settings.save()

§ grouping\_recipe\_settings.compute\_schema\_updates().apply()

§ grouping\_recipe\_settings.run()

## Working with flow zones[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#working-with-flow-zones "Permalink to this headline")

### Creating a zone and adding items in it[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#creating-a-zone-and-adding-items-in-it "Permalink to this headline")

§ flow = project.get\_flow()

§ zone = flow.create\_zone("zone1")

§ # First way of adding an item to a zone

§ dataset = project.get\_dataset("mydataset")

§ zone.add\_item(dataset)

§ # Second way of adding an item to a zone

§ dataset = project.get\_dataset("mydataset")

§ dataset.move\_to\_zone("zone1")

§ # Third way of adding an item to a zone

§ dataset = project.get\_dataset("mydataset")

§ dataset.move\_to\_zone(zone)

### Listing and getting zones[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#listing-and-getting-zones "Permalink to this headline")

§ # List zones

§ for zone in flow.list\_zones()

§ print("Zone id=%s name=%s" % (zone.id, zone.name))

§ print("Zone has the following items:")

§ for item in zone.items:

§ print("Zone item: %s" % item)

§ # Get a zone by id - beware, id not name

§ zone = flow.get\_zone("21344ZsQZ")

§ # Get the "Default" zone

§ zone = flow.get\_default\_zone()

### Changing the settings of a zone[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#changing-the-settings-of-a-zone "Permalink to this headline")

§ flow = project.get\_flow()

§ zone = flow.get\_zone("21344ZsQZ")

§ settings = zone.get\_settings()

§ settings.name = "New name"

§ settings.save()

### Getting the zone of a dataset[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#getting-the-zone-of-a-dataset "Permalink to this headline")

§ dataset = project.get\_dataset("mydataset")

§ zone = dataset.get\_zone()

§ print("Dataset is in zone %s" % zone.id)

## Navigating the flow graph[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#navigating-the-flow-graph "Permalink to this headline")

DSS builds the Flow graph dynamically by enumerating datasets, folders, models and recipes and linking all together through the inputs and outputs of the recipes. Since navigating this can be complex, the `dataikuapi.dss.flow.DSSProjectFlow` class gives you access to helpers for this

### Finding sources of the Flow[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#finding-sources-of-the-flow "Permalink to this headline")

§ flow = project.get\_flow()

§ graph = flow.get\_graph()

§ for source in graph.get\_source\_computables(as\_type="object"):

§ print("Flow graph has source: %s" % source)

### Enumerating the graph in order[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#enumerating-the-graph-in-order "Permalink to this headline")

This method will return all items in the graph, “from left to right”. Each item is returned as a `DSSDataset`, `DSSManagedFolder`, `DSSSavedModel`, `DSSStreamingEndpoint` or `DSSRecipe`

§ flow = project.get\_flow()

§ graph = flow.get\_graph()

§ for item in graph.get\_items\_in\_traversal\_order(as\_type="object"):

§ print("Next item in the graph is %s" % item)

### Replacing an input everywhere in the graph[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#replacing-an-input-everywhere-in-the-graph "Permalink to this headline")

This method allows you to replace an input (dataset for example) in every recipe where it appears as a input

§ flow = project.get\_flow()

§ flow.replace\_input\_computable("old\_dataset", "new\_dataset")

§ # Or to replace a managed folder

§ flow.replace\_input\_computable("oldid", "newid", type="MANAGED\_FOLDER")

## Schema propagation[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#schema-propagation "Permalink to this headline")

When the schema of an input dataset is modified, or when the settings of a recipe are modified, you need to propagate this schema change across the flow.

This can be done from the UI, but can also be automated through the API

§ flow = project.get\_flow()

§ # A propagation always starts from a source dataset and will move from left to right till the end of the Flow

§ propagation = flow.new\_schema\_propagation("sourcedataset")

§ future = propagation.start()

§ future.wait\_for\_result()

There are many options for propagation, see `dataikuapi.dss.flow.DSSSchemaPropagationRunBuilder`

## Exporting a flow documentation[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#exporting-a-flow-documentation "Permalink to this headline")

This sample shows how to generate and download a flow documentation from a template.

See Flow Document Generator for more information.

§ # project is a DSSProject object

§ flow = project.get\_flow()

§ # Launch the flow document generation by either

§ # using the default template by calling without arguments

§ # or specifying a managed folder id and the path to the template to use in that folder

§ future = flow.generate\_documentation(FOLDER\_ID, "path/my\_template.docx")

§ # Alternatively, use a custom uploaded template file

§ with open("my\_template.docx", "rb") as f:

§ future = flow.generate\_documentation\_from\_custom\_template(f)

§ # Wait for the generation to finish, retrieve the result and download the generated

§ # flow documentation to the specified file

§ result = future.wait\_for\_result()

§ export\_id = result["exportId"]

§ flow.download\_documentation\_to\_file(export\_id, "path/my\_flow\_documentation.docx")

## Reference documentation[¶](https://doc.dataiku.com/dss/latest/api/python/flow.html#reference-documentation "Permalink to this headline")

*class* `dataikuapi.dss.flow.``DSSProjectFlow`(*client*, *project*)

`get_graph`()

`create_zone`(*name*, *color='#2ab1ac'*)

Creates a new flow zone

:returns the newly created zone :rtype: `DSSFlowZone`

`get_zone`(*id*)

Gets a single Flow zone by id :rtype: `DSSFlowZone`

`get_default_zone`()

Returns the default zone of the Flow :rtype: `DSSFlowZone`

`list_zones`()

Lists all zones in the Flow :rtype: list of `DSSFlowZone`

`get_zone_of_object`(*obj*)

Finds the zone to which this object belongs.

If the object is not found in any specific zone, it belongs to the default zone, and the default zone is returned

* Parameters: **obj** (*object*) – A `dataikuapi.dss.dataset.DSSDataset`, `dataikuapi.dss.managedfolder.DSSManagedFolder`,
or `dataikuapi.dss.savedmodel.DSSSavedModel` to search

* Return type: `DSSFlowZone`

`replace_input_computable`(*current\_ref*, *new\_ref*, *type='DATASET'*)

This method replaces all references to a “computable” (Dataset, Managed Folder or Saved Model) as input of recipes in the whole Flow by a reference to another computable.

No specific checks are performed. It is your responsibility to ensure that the schema of the new dataset will be compatible with the previous one (in the case of datasets).

If new\_ref references an object in a foreign project, this method will automatically ensure that new\_ref is exposed to the current project

* Parameters: * **str** (*type*) – Either a “simple” object name (dataset name, model id, managed folder id)
or a foreign object reference in the form “FOREIGN\_PROJECT\_KEY.local\_id”)
* **str** – Either a “simple” object name (dataset name, model id, managed folder id)
or a foreign object reference in the form “FOREIGN\_PROJECT\_KEY.local\_id”)
* **str** – The type of object being replaced (DATASET, SAVED\_MODEL or MANAGED\_FOLDER)

`generate_documentation`(*folder\_id=None*, *path=None*)

Start the flow document generation from a template docx file in a managed folder, or from the default template if no folder id and path are specified.

* Parameters: * **folder\_id** – (optional) the id of the managed folder
* **path** – (optional) the path to the file from the root of the folder

* Returns: A `DSSFuture` representing the flow document generation process

`generate_documentation_from_custom_template`(*fp*)

Start the flow document generation from a docx template (as a file object).

* Parameters: **fp** (*object*) – A file-like object pointing to a template docx file

* Returns: A `DSSFuture` representing the flow document generation process

`download_documentation_stream`(*export\_id*)

Download a flow documentation, as a binary stream.

Warning: this stream will monopolize the DSSClient until closed.

* Parameters: **export\_id** – the id of the generated flow documentation returned as the result of the future

* Returns: A `DSSFuture` representing the flow document generation process

`download_documentation_to_file`(*export\_id*, *path*)

Download a flow documentation into the given output file.

* Parameters: * **export\_id** – the id of the generated flow documentation returned as the result of the future
* **path** – the path where to download the flow documentation

* Returns: None

`start_tool`(*type*, *data={}*)

Start a tool or open a view in the flow

* Parameters: * **str** (*type*) – one of {COPY, CHECK\_CONSISTENCY, PROPAGATE\_SCHEMA} (tools) or {TAGS, CUSTOM\_FIELDS, CONNECTIONS, COUNT\_OF\_RECORDS, FILESIZE, FILEFORMATS, RECIPES\_ENGINES, RECIPES\_CODE\_ENVS, IMPALA\_WRITE\_MODE, HIVE\_MODE, SPARK\_ENGINE, SPARK\_CONFIG, SPARK\_PIPELINES, SQL\_PIPELINES, PARTITIONING, PARTITIONS, SCENARIOS, CREATION, LAST\_MODIFICATION, LAST\_BUILD, RECENT\_ACTIVITY, WATCH} (views)
* **dict** (*data*) – initial data for the tool (optional)

* Returns: a `flow.DSSFlowTool` handle to interact with the newly-created tool or view

`new_schema_propagation`(*dataset\_name*)

Start an automatic schema propagation from a dataset

* Parameters: **str** (*dataset\_name*) – name of a dataset to start propagating from

:returns a `DSSSchemaPropagationRunBuilder` to set options and start the propagation

*class* `dataikuapi.dss.flow.``DSSProjectFlowGraph`(*flow*, *data*)

`get_source_computables`(*as\_type='dict'*)

Returns the list of source computables. :param as\_type: How to return the source computables. Possible values are “dict” and “object”

* Returns: if as\_type=dict, each computable is returned as a dict containing at least “ref” and “type”.
if as\_type=object, each computable is returned as a `dataikuapi.dss.dataset.DSSDataset`,
> 
> `dataikuapi.dss.managedfolder.DSSManagedFolder`,
> `dataikuapi.dss.savedmodel.DSSSavedModel`, or streaming endpoint
> 
> 
>

`get_source_recipes`(*as\_type='dict'*)

Returns the list of source recipes. :param as\_type: How to return the source recipes. Possible values are “dict” and “object”

* Returns: if as\_type=dict, each recipes is returned as a dict containing at least “ref” and “type”.
if as\_type=object, each computable is returned as a `dataikuapi.dss.recipe.DSSRecipe`,

`get_source_datasets`()

Returns the list of source datasets for this project. :rtype list of `dataikuapi.dss.dataset.DSSDataset`

`get_successor_recipes`(*node*, *as\_type='dict'*)

Returns a list of recipes that are a successor of a graph node

* Parameters: * **node** – Either a name or `dataikuapi.dss.dataset.DSSDataset` object
* **as\_type** – How to return the successor recipes. Possible values are “dict” and “object”

* :return if as\_type=dict, each recipes is returned as a dict containing at least “ref” and “type”.: if as\_type=object, each computable is returned as a `dataikuapi.dss.recipe.DSSRecipe`,

`get_successor_computables`(*node*, *as\_type='dict'*)

Returns a list of computables that are a successor of a given graph node

* Parameters: **as\_type** – How to return the successor recipes. Possible values are “dict” and “object”

* :return if as\_type=dict, each recipes is returned as a dict containing at least “ref” and “type”.: if as\_type=object, each computable is returned as a `dataikuapi.dss.recipe.DSSRecipe`,

`get_items_in_traversal_order`(*as\_type='dict'*)

*class* `dataikuapi.dss.flow.``DSSFlowZone`(*flow*, *data*)

A zone in the Flow. Do not create this object manually, use `DSSProjectFlow.get\_zone()` or `DSSProjectFlow.list\_zones()`

*property* `id`

*property* `name`

*property* `color`

`get_settings`()

Gets the settings of this zone in order to modify them

* Return type: `DSSFlowZoneSettings`

`add_item`(*obj*)

Adds an item to this zone.

The item will automatically be moved from its existing zone. Additional items may be moved to this zone as a result of the operation (notably the recipe generating obj).

* Parameters: **obj** (*object*) – A `dataikuapi.dss.dataset.DSSDataset`, `dataikuapi.dss.managedfolder.DSSManagedFolder`,
or `dataikuapi.dss.savedmodel.DSSSavedModel` to add to the zone

`add_items`(*items*)

Adds items to this zone.

The items will automatically be moved from their existing zones. Additional items may be moved to this zone as a result of the operations (notably the recipe generating the items).

* Parameters: **items** (*list*) – A list of objects, either `dataikuapi.dss.dataset.DSSDataset`, `dataikuapi.dss.managedfolder.DSSManagedFolder`,
or `dataikuapi.dss.savedmodel.DSSSavedModel` to add to the zone

*property* `items`

The list of items explicitly belonging to this zone.

This list is read-only, to modify it, use `add\_item()` and `remove\_item()`.

Note that the “default” zone never has any items, as it contains all items that are not explicitly in a zone. To get the full list of items in a zone, including in the “default” zone, use the `get\_graph()` method.

* @rtype list of zone items, either `dataikuapi.dss.dataset.DSSDataset`,: `dataikuapi.dss.managedfolder.DSSManagedFolder`,
or `dataikuapi.dss.savedmodel.DSSSavedModel` or `dataiuapi.dss.recipe.DSSRecipe`

`add_shared`(*obj*)

Share an item to this zone.

The item will not be automatically unshared from its existing zone.

* Parameters: **obj** (*object*) – A `dataikuapi.dss.dataset.DSSDataset`, `dataikuapi.dss.managedfolder.DSSManagedFolder`,
or `dataikuapi.dss.savedmodel.DSSSavedModel` to share to the zone

`remove_shared`(*obj*)

Remove a shared item from this zone.

* Parameters: **obj** (*object*) – A `dataikuapi.dss.dataset.DSSDataset`, `dataikuapi.dss.managedfolder.DSSManagedFolder`,
or `dataikuapi.dss.savedmodel.DSSSavedModel` to share to the zone

*property* `shared`

The list of items that have been explicitly pre-shared to this zone.

This list is read-only, to modify it, use `add\_shared()` and `remove\_shared()`

* @rtype list of shared zone items, either `dataikuapi.dss.dataset.DSSDataset`,: `dataikuapi.dss.managedfolder.DSSManagedFolder`,
or `dataikuapi.dss.savedmodel.DSSSavedModel` or `dataiuapi.dss.recipe.DSSRecipe`

`get_graph`()

`delete`()

Delete the zone, all items will be moved to the default zone

*class* `dataikuapi.dss.flow.``DSSFlowZoneSettings`(*zone*)

The settings of a flow zone. Do not create this directly, use `DSSFlowZone.get\_settings()`

`get_raw`()

Gets the raw settings of the zone.

You cannot modify the items and shared elements through this class. Instead, use `DSSFlowZone.add\_item()` and others

*property* `name`

*property* `color`

`save`()

Saves the settings of the zone

*class* `dataikuapi.dss.flow.``DSSSchemaPropagationRunBuilder`(*project*, *client*, *dataset\_name*)

Do not create this directly, use `DSSProjectFlow.new\_schema\_propagation()`

`set_auto_rebuild`(*auto\_rebuild*)

Sets whether to automatically rebuild datasets if needed while propagating (default true)

`set_default_partitioning_value`(*dimension*, *value*)

In the case of partitioned flows, sets the default partition value to use when rebuilding, for a specific dimension name

* Parameters: * **dimension** (*str*) – a partitioning dimension name
* **value** (*str*) – a partitioning dimension value

`set_partition_for_computable`(*full\_id*, *partition*)

In the case of partitioned flows, sets the partition id to use when building a particular computable. Overrides the default partitioning value per dimension

* Parameters: * **full\_id** (*str*) – Full name of the computable, in the form PROJECTKEY.id
* **partition** (*str*) – a full partition id (all dimensions)

`stop_at`(*recipe\_name*)

Marks a recipe as a recipe where propagation stops

`mark_recipe_as_ok`(*name*)

Marks a recipe as always considered as OK during propagation

`set_grouping_update_options`(*recipe=None*, *remove\_missing\_aggregates=True*, *remove\_missing\_keys=True*, *new\_aggregates={}*)

Sets update options for grouping recipes :param str recipe: if None, applies to all grouping recipes. Else, applies only to this name

`set_window_update_options`(*recipe=None*, *remove\_missing\_aggregates=True*, *remove\_missing\_in\_window=True*, *new\_aggregates={}*)

Sets update options for window recipes :param str recipe: if None, applies to all window recipes. Else, applies only to this name

`set_join_update_options`(*recipe=None*, *remove\_missing\_join\_conditions=True*, *remove\_missing\_join\_values=True*, *new\_selected\_columns={}*)

Sets update options for join recipes :param str recipe: if None, applies to all join recipes. Else, applies only to this name

`start`()

Starts the actual propagation. Returns a future to wait for completion

* Return type: `dataikuapi.dss.future.DSSFuture`
