# Weight of Evidence and Information Value Framework:

Weight of Evidence is calculated as \(ln(\frac{\%\ of\ non\ events}{\%\ of\ events})\) where for credit risk modeling, non-event means a good customer that doesnâ€™t default and event means bad customer that defaults. Percentage of non-events is % of good customers in a particular group and percentage of events is % of bad customers in a particular group.

Information Value (IV) can now be calculated as \(IV = \Sigma{(\%\ of\ events - \% of\ non\ events) * WoE}\)

To calculate WoE, we follow these four steps:

```
1. For a continuous variable, we discretize them by split into 10 bins by value ranges.
2. Calculate the number of events and non-events for each bin.
3. Calculate % of events and % of non-events for each bin.
4. Calculate WoE by applying the formula above.
```

We do not have to discretize the discrete variables. And for continuous variables, `fine classing`

means we create 10/20 bins for a continuous variable and then calculate WoE and IV. `Coarse classing`

means we combine categories/splits with similar WoE scores as similar WoE means the two categories/splits have similar behaviors.

We can quickly select or eliminate variables using a calculated Information Value using the following table.

Information Value | Predictive Power |
---|---|

< 0.02 | Useless |

0.02 ~ 0.1 | Weak |

0.1 ~ 0.3 | Medium |

0.3 ~ 0.5 | Strong |

> 0.5 | Suspecious |

There are some basic rules relating to WoE:

- Each bin should have at least 5% of the observations
- Each bin should have non-zero for both non-events and events
- WoE should be distinct for each bin. Similar bins should be aggregated
- WoE should be monotonic
- Missing values are binned separately

Note that information value as a feature selection method is designed mainly for the logistic regression model. Tree-based models can detect non-linear relationships very well, thus using IV for feature selection for tree ensembles might not produce the most accurate and robust predictive model.

# Reference:

Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring

```
@misc{leehanchung,
author = {Lee, Hanchung},
title = {Weight of Evidence and Information Value},
year = {2021},
howpublished = {Github Repo},
url = {https://leehanchung.github.io/2021-04-30-woe-iv/}
```