# Amazon Comprehend Medical[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

## This plugin provides recipes to call the Amazon Comprehend Medical APIs[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

## Plugin information[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

|  |  |

| --- | --- |

| Version | 1.0.2 |

| Author | Dataiku (Alex COMBESSIE) |

| Released | 2020-06 |

| Last updated | 2020-10 |

| License | Apache Software License |

| Source code | Github |

| Reporting issues | Github |

With this plugin, you will be able to:

* Extract Protected Health Information (PHI) in a medical text record

* Recognize Medical Entities (medical condition, treatment, etc.) in a medical text record

Note that the Amazon Comprehend Medical API  is a paid service. You can consult the API pricing page to evaluate the future cost.

## How to set up[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

If you are a Dataiku and AWS admin user, follow these configuration steps right after you install the plugin. If you are not an admin, you can forward this to your admin and scroll down to the **How to use**section.

### 1. Create an IAM user with the Amazon Comprehend Medical policy – in AWS[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

Let’s assume that your AWS account has already been created and that you have full admin access. If not, please follow this guide.

Start by creating a dedicated IAM user to centralize access to the Comprehend API, or select an existing one. Next, you will need to attach a policy to this user following this documentation. We recommend using the *“ComprehendMedicalFullAccess”* managed policy, as shown below:

Alternatively, you can create a custom IAM policy to allow  *“comprehendmedical:\*”* actions.  After completing this step, you will be able to retrieve the user Access key ID and Secret access key.

### 2. Create an API configuration preset – in Dataiku DSS[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

In Dataiku DSS, navigate to the Plugin page > Settings > API configuration and create your first preset.

### 3. Configure the preset – in Dataiku DSS[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

* **Fill AUTHENTIFICATION settings.**

+ Copy-paste your *Access key ID* and *Secret access key* from **Step 1** in the corresponding fields.

+ The *AWS region* parameter needs to be specified within this list.

+ Alternatively, you may leave the fields empty so that the credentials are ascertained from the server environment. If you choose this option, please follow this documentation on the server hosting DSS.

* **(Optional) Review the API QUOTA settings.**

+ The default API Quota settings ensure that one recipe calling the API will be throttled at 5 requests *(Rate limit* parameter) per second *(Period* parameter). In other words, after sending 5 requests, it will wait for 1 second, then send another 5, etc.

+ This default quota is defined by Amazon. You can request a quota increase, as documented on this page.

+ You may need to decrease the *Rate limit* parameter if you envision that multiple recipes will run concurrently to call the API. For instance, if you want to allow 5 concurrent DSS activities, you can set this parameter at 5/5 = 1 request per second.

* **(Optional) Review the PARALLELIZATION settings.**

+ The default *Concurrency* parameter means that 4 threads will call the API in parallel. This parallelization operates within the API Quota settings defined above.

+ We do not recommend to change this default parameter unless your server has a much higher number of CPU cores.

* **Set the Permissions of your preset.**

+ You can declare yourself as Owner of this preset and make it available to everybody, or to a specific group of users.

+ Any user belonging to one of these groups on your Dataiku DSS instance will be able to see and use this preset.

Voilà! Your preset is ready to be used.

Later, you (or another Dataiku admin) will be able to add more presets. This can be useful to segment plugin usage by user group. For instance, you can create a “Default” preset for everyone and a “High performance” one for your Marketing team, with separate billing for each team.

## How to use[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

Let’s assume that you have a Dataiku DSS project with a dataset containing medical records. These records must be stored in a dataset, inside a text column, with one row for each record.

As an example, we will use a sample of the PubMed dataset. You can follow the same steps with your own data.

To create your first recipe, navigate to the Flow, click on the **+ RECIPE** button and access the **Natural Language Processing** menu.

### Protected Health Information Extraction[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

#### Input[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

* **Dataset with a text column**

#### Output[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

* **Dataset with 12 additional columns**

+ One column for each type of PHI entities (see this list) with a list of entities

+ Raw response from the API in JSON format

+ Error message from the API if any

+ Error type (module and class name) if any

#### Settings[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

* **Fill** **INPUT PARAMETERS**

+ You can specify the *Text column* parameter for your column containing text data.

+ Only English is currently supported by the API at the moment.

* **(Optional) Review CONFIGURATION parameters**

+ The *API configuration preset* parameter is automatically filled by the default one made available by your Dataiku admin.

+ You may select another one if multiple presets have been created.

* **(Optional) Review ADVANCED parameters**

+ You can activate the *Expert mode* to access advanced parameters.

+ The *Minimum score* parameter can be increased from 0 to 1 to filter results which are not relevant. Default is 0 so that no filtering is applied.

+ The *Error handling* parameter determines how the recipe will behave if the API returns an error.

- In “Log” error handling, this error will be logged to the output but it will not cause the recipe to fail.

- We do not recommend to change this parameter to “Fail” mode unless this is the desired behaviour.

### Medical Entity Recognition[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

#### Input[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

* **Dataset with a text column**

#### Output[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

* **Dataset with additional columns**

+ One column for each selected medical entity type with a list of entities

+ Raw response from the API in JSON format

+ Error message from the API if any

+ Error type (module and class name) if any

#### Settings[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

The parameters are almost the same as the Protected Health Information Extraction recipe (see above). The only change is the addition of the *Entity types* parameter under **CONFIGURATION**. You can select multiple entity types within this list.

Happy natural language processing!

##### Install In DSS[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

To install the plugin, open the  Apps menu, click Plugins and search for Amazon Comprehend Medical.

Alternatively, you can download a zipped version here.

### Get the Dataiku Data Sheet[¶](https://www.dataiku.com/product/plugins/amazon-comprehend-nlp-medical/)

Learn everything you ever wanted to know about Dataiku (but were afraid to ask), including detailed specifications on features and integrations.
