[Data Analytics / Modelling](flow_zone:sjc55sv) prepares the data for modelling as described in article [Regression Analysis](article:16). 

![model.png](B7uYZmnOMVk2)

- Joined dataset [svi_vulnerability_cdc_joined](dataset:svi_vulnerability_cdc_joined) gathers the features for regression analysis and clustering. Each row record has information for a unique tract with both percentage and percentile values for each social vulnerability factor along with the percentage disease value for each chronic disease. Metadata such as state, county and population are used for the segmentation of values in the model analysis/reporting phase.
 
 The following process is manually executed recursively for a ```set_diseases =  {Coronary Heart Disease, Stroke, Chronic Kidney Disease, Current Asthma, Cancer (except skin), Diabetes}```
 
For a ```disease``` in ```set_diseases```:
 1. [Filter](recipe:compute_svi_vulnerability_cdc_by_disease) on ```disease```.
 2. [Train](recipe:train_Regression_Models__Percent_Disease_) a [Ridge Regression Model].(saved_model:QOCx9pEZ) with input 16 features ```Percent Social Vulnerability factors``` and target variable the ```disease value```.
 3. [Deploy](https://knowledge.dataiku.com/latest/courses/scoring/deploy-model/deploy-model-summary.html) the model to the flow. 
 4. [Score](recipe:score_svi_vulnerability_cdc_by_disease) the input dataset. Modify the setting of the score recipe to compute individual explanations on how each attribute influenced the prediction of the model (SHAP values explained in the article [Regression Analysis](article:16)).
 5. Repeat the process for the next ```disease```.

- Recipe [compute_svi_vulnerability_cdc_stacked](recipe:compute_svi_vulnerability_cdc_stacked) stacks the data in a long format [dataset](dataset:svi_vulnerability_cdc_stacked) that contains the disease name, the original percent disease value, metadata for each tract, and the model output explanations / SHAP values for each input feature -  the average marginal contribution of a feature value to the prediction across all possible coalitions in a dictionary format. 

- Recipes [compute_svi_vulnerability_cdc_standardize](recipe:compute_svi_vulnerability_cdc_standardize),[compute_svi_vulnerability_cdc_prepared](recipe:compute_svi_vulnerability_cdc_prepared), standardize the FIPS code, and unfold the dictionary format explanations. 



