## This solution consists of two essential data sources:
## 1. Solution accesses public databases to analyze reported molecules
Python recipe [compute_database_query](recipe:compute_database_query) retreives all the studied molecules from the user defined (see [Solution Parameters](article:16)) public database (see [Systems and Databases](article:15)) and target protein of interest. The two output files that consist the base of thise solution are (features are explained in [Solution Parameters](article:16)):

 - [Metadata](dataset:metadata) that capture an overview of the user query  and the output results 
  ![metadata.png](m7XTDiz6KWxH)

 - [Molecules](dataset:molecules) that store all the molecules studied before and reported on the database.
  ![molecules.png](LkRQXSyg85pK)

## 2. Users submit novel molecules for testing
The user is required to upload their own [test_data](dataset:test_dataff) for scoring novel molecules or potential drug candidates with structure:

| Column Name               | Column Type
| ------------ | ------------ | 
| molecule_id | STRING |
| canonical_simles | STRING |

![test_data.png](oZgM2tygvehF)

## 3. Pre-uploaded ClinTox dataset

The [ClinTox](https://paperswithcode.com/dataset/clintox) dataset is a benchmark dataset from MoleculeNet, designed to predict drug toxicity based on chemical structure. It contains 1,491 drug compounds with known toxicity profiles, compiled from FDA approvals and clinical trial failures. The information in this dataset can help:
- Help predict drug toxicity risk before experimental validation.
- Prevent late-stage drug failures due to clinical toxicity.
- Reduces costs in drug discovery by focusing only on safe compounds.

**ClinTox Data Sources** 
- FDA-approved drugs from [SWEETLEAD](https://simtk.org/projects/sweetlead) database: A curated collection of known drugs that have been successfully approved.
- Failed drugs due to toxicity by AACT (Aggregate Analysis of ClinicalTrials.gov): A dataset tracking drugs that failed in clinical trials due to toxicity concerns.

The dataset includes a binary classification task:
- ```CT_TOX (0/1)```: Whether the drug failed due to toxicity.


| Column Name               | Column Type
| ------------ | ------------ | 
| canonical_simles | STRING |
| CT_TOX | BINARY |


|canonical_smiles  | CT_TOX
| ------------  |  ------------ |
|CC(=O)OC1=CC=CC=C1C(=O)|0|
|CN1CCN(CC1)C2=CC=C(C=C2)C3=NC=CN=C3N|0|
|CCC(C)C1=CC(=O)NC(=O)N1C|1|
|CCOC(=O)C1=CC=CC=C1OC|1|






