Claim Severity is modelled using a GLM, similarly to the [Claim Frequency](article:16) modeling except for some differences that we will specifically highlight in this article.

![Claim Severity Modeling.png](xfGxoBrLfVzN)

# Input Datasets

Claim Severity is the analysis of the amount of a claim, conditionnal on a claim existing. Therefore, we first filter our dataset to include only observations where ClaimNb is greater than 0.

For the scoring, we also rely on a ClaimNb prediction to compute a ClaimAmount prediction, so we will need to first score the claim_test dataset with the Claim Frequency model, then join it back to claim_test to create the dataset we want to score.

# Modeling

The Script is the same as for [Claim Frequency](article:16). However the target differs: here we are predicting ClaimAmount. As ClaimAmount likely follows a gamma distribution, the metric will be different, we will monitor gamma deviance. The choices of variables and preprocessor to include will vary somewhat with those chosen for Claim Frequency because both targets do not have same dependency. This will be described in more details in [Dashboard articles](article:10).

ClaimAmount is modeled using GLM with the following parameters:
- Elastic Net Penalty: 0
- Distribution: Gamma
- Link function: Log
- Offset mode: Offsets/Exposures
- Training dataset: claim_severity_analysis
- Offset columns: None
- Exposure columns: ClaimNb, here claim amounts are normalized by claim number instead of exposure because in the end, we want to model the amount per claim.