Weight of Evidence and Information Value Framework:

Weight of Evidence is calculated as \(ln(\frac{\%\ of\ non\ events}{\%\ of\ events})\) where for credit risk modeling, non-event means a good customer that doesn’t default and event means bad customer that defaults. Percentage of non-events is % of good customers in a particular group and percentage of events is % of bad customers in a particular group.

Information Value (IV) can now be calculated as \(IV = \Sigma{(\%\ of\ events - \% of\ non\ events) * WoE}\)

To calculate WoE, we follow these four steps:

1. For a continuous variable, we discretize them by split into 10 bins by value ranges.
2. Calculate the number of events and non-events for each bin.
3. Calculate % of events and % of non-events for each bin.
4. Calculate WoE by applying the formula above.

We do not have to discretize the discrete variables. And for continuous variables, fine classing means we create 10/20 bins for a continuous variable and then calculate WoE and IV. Coarse classing means we combine categories/splits with similar WoE scores as similar WoE means the two categories/splits have similar behaviors.

We can quickly select or eliminate variables using a calculated Information Value using the following table.

Information Value Predictive Power
< 0.02 Useless
0.02 ~ 0.1 Weak
0.1 ~ 0.3 Medium
0.3 ~ 0.5 Strong
> 0.5 Suspecious

There are some basic rules relating to WoE:

  1. Each bin should have at least 5% of the observations
  2. Each bin should have non-zero for both non-events and events
  3. WoE should be distinct for each bin. Similar bins should be aggregated
  4. WoE should be monotonic
  5. Missing values are binned separately

Note that information value as a feature selection method is designed mainly for the logistic regression model. Tree-based models can detect non-linear relationships very well, thus using IV for feature selection for tree ensembles might not produce the most accurate and robust predictive model.

Reference:

Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring

@misc{leehanchung,
author = {Lee, Hanchung},
title = {Weight of Evidence and Information Value},
year = {2021},
howpublished = {Github Repo},
url = {https://leehanchung.github.io/2021-04-30-woe-iv/}