# Preparing Graph Nodes
Given the complex nature of chemical pathology, mapping the connections between compounds, genes and diseases requires a multigraph structure with heterogenous nodes. In this graph there are 7 types of nodes to connect, listed below. 

 - Genes
 - Diseases
 - Drugs
 - Pathways
 - Symptoms
 - Anatomy
 - Ontology Nodes : Cellular Compounds, Molecular Function, Biological Processes

These nodes serve as center points for the final graph network, and are written to the Neo4J database in the [Node Exports](flow_zone:33Lpzn3) part of the flow. The node datasets are derived from the various source tables with minimal cleaning to ensure variable naming consistency across the project.

For the most part, getting node datasets ready requires little more than a prepare recipe before the export. Some nodes, such as the [Ontologies](flow_zone:tq6pmMD)  or [Pathways](flow_zone:3jsEbuV) use basic python recipies to unnest json data into a tabluar format. The prepare recipes after these steps further unnest data into the relevant features, dropping any extraneous meta data. 

# Exporting Graph Nodes

Once the datasets are ready, we use the Neo4J plugin to easily export these nodes to the database. The export recipe takes a single dataset of nodes, allows you to define the node label, and point to a column in the dataset to act as a unique identifier for each node. This means that the recipe will not write duplicate nodes to the graph. Additionally, all or select columns can be added as properties to the node, which can provide better information for downstream analytics. For more information on how to use the export features of this plugin, please see the [Neo4J Plugin Page](https://www.dataiku.com/product/plugins/neo4j/).