The Dataiku Application configures the Solution with a defined ClinicalTrial.gov API query. 

# Connections Configuration
Select the preferred connection for the data frames and folders where you want to build. The optimal engine will apply to the recipes within the flow. The Solution also requires proper storage for the text embedding vectors and similarity index as pickle files.
![dataiku-application-instance.png](aZOy0WShRpit)
 1. Select a connection.
 2. Click on the **RECONFIGURE**  button
 
<div class="alert">
The current version supports filesystem, S3, and Snowflake connections. Due to data type geometry incompatibility, the SQL connections Databricks, Redshift, and Postgresql are not supported.  
</div>

 
# Define Study Scope
This section defines the customized query to [ClinicalTrials.gov API](https://beta.clinicaltrials.gov/data-about-studies/learn-about-api). In other words, the customized query establishes the scope of the clinical trials that feed into the intelligence of this Solution. The query convention follows the [documentation](https://beta.clinicaltrials.gov/data-about-studies/learn-about-api). 
![dataiku-application-query.png](3VMt4229LT6z)
 1. Select or type in the search terms to define your customized query


# Include Demographic & Social Determine of Health Dataset
If included, this optional dataset augments the study enrollment rate prediction model and the clinical site intelligence. The current release is limited to the SDOH data of USA counties. Read [Social Determinants Of Health](project:SOL_SDOH) for more information.
![dataiku-application-sdoh.png](49GwQ059bDYC)
 1. Tick to include the Demographic and SDOH dataset


# Build the Flow
It processes the configurations and creates all pipelines and models in the flow. 
![dataiku-application-build.png](euLkN6vwJXQv)
 1. Click on the **BUILD** button to create the Flow
 
<div class="alert">
 Due to the scale and complexity of this Solution, the flow may take some time to complete. The run time is proportional to the scope of the ct.gov API query. For the data packaging of this release, we queried the cancer studies in the US after 2018, which resulted in 15k unique studies in total. The total run time for DSS on the cloud (with 2 CPUs and 16G RAM) to complete its flow zones (including models) was around 270 minutes. We encourage users to test their query on the official ct.gov browser (https://clinicaltrials.gov/) to estimate the size of the result. Users can then extrapolate the expected run time to build the Solution.  
 Since the queried studies will be used to train the enrollment rate prediction model and build the similarity index, we recommend that users include at least a few thousand studies for model training to achieve optimal model performance.
</div>


# Launch Web App 
Launch the Clinical Sites Intelligence Web App to review insights from study similarity analysis and clinical site intelligence.
![dataiku-application-launch.png](n9gRaRYbAQx7)
 1. Click on the Web app super link for access
 
 
# Create Sponsor Dashboard
Create a Sponsor Dashboard to overview studies and sites sponsored by a selected lead sponsor.
![dataiku-application-dashboard.png](1wpsAt1Sspjt)
 1. Select a lead sponsor
 2. Click on the **CREATE** button according to your SDOH dataset configuration. If you have included SDOH dataset in your setup, choose the option for **Create Sponsor Dashboard with SDOH Data**. Otherwise, select the **Create Sponsor Dashboard** option.
 3. View the dashboard!