Credit Scoring models rely on [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression), which is an extension of Linear Regression to binary classification, which is the task at hand, determining whether or not an applicant will be a good or bad creditor. 

# Logistic Regression

The [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) consists in building a probability function, which outputs a number between 0 and 1, and set a threshold that will determine over what probability that an applicant is categorized as bad, otherwise it will be categorized as good. The probability function is written as follows:

```math
p(\mathbf{X}) = \frac{1}{1 + \exp{-\mathbf{\beta} \mathbf{X}}}
```

with $`\mathbf{\beta}`$ being the vector of coefficients and $`\mathbf{X}`$ a vector of dependent variables for one observation. As for other Generalized Linear Models, the logistic regression coefficients are fit by maximizing the log-likelihood. The output regression coefficients give an idea of the strength of each variable in the model, and their sign indicates if they are positively or negatively correlated with credit risk. However, the values themselves may not be easily interpretable. This is the reason why it is common practice to turn the coefficients into a more practical scorecard.

# Building the Scorecard

The scorecard will assign several points to each possible value a feature can take, and the final score will simply be computed as the sum of all these points. This additive property comes from the use of a logistic regression which is a linear model and would not be possible with a tree-based model, for instance. It is from the scores themselves that the threshold defined above will be set according to business expertise and thanks to the weight of evidence encoding. The procedure to build the scorecard is explained in detail in Siddiqi book in [the resources](article:4). Three parameters need to be set by the user:

- base score: threshold for a score being good
- base odds: expected odds when the score equals the base_score
- points to double odds: the increase of points needed to double the odds

Then:

```math
Score = Offset + Factor * ln(Odds)
```
```math
Factor = \frac{points\_to\_double\_odds}{ln(2)}
```
```math
Offset = base\_score - Factor * ln(base\_odds)
```
```math
ln(Odds) = - (\sum{woe_i * \beta_i} + \beta_0)
```


