Calculation of Weight of Evidence(WOE)
Weight of evidence(WOE):
WOEi=log(%negi%posi)
where i=1,2,...k, and k is the number of bins.
Calculation of Information Value(IV)
Information Value(IV):
IV=i=1∑k{(%posi−%negi)×WOEi}
WOE and IV work for both continuous and categorical variables.
CONTINUOUS/CATEGORICAL->CATEGORICAL(discrete numeric values)
Calculation
Step 1: binning(out of the scope of this post)
- CONTINUOUS: calculate pos and neg relative percentage of frequencies by intervals
- CATEGORICAL: calculate pos and neg relative percentage of frequencies by categories
Optionally there could be a MISSING bin.
Step 2: Calculate WOE for each bin
WOEi=ln(%negi%posi)=ln(negi/∑inegiposi/∑iposi)
Step 3: Calculate IV
IVi=(%posi−%negi)∗WOEi
Step 4: Sum Up
IV=i=1∑kIVi
put everything together:
IV=i=1∑k{(%posi−%negi)ln(%negi%posi)}
Example
(This data is made up and only for illustration of calculation)
bin |
%pos |
%neg |
WOE |
IV |
MISSING |
0.1 |
0.05 |
0.693 |
0.035 |
1 |
0.15 |
0.05 |
1.099 |
0.110 |
2 |
0.15 |
0.1 |
0.405 |
0.020 |
3 |
0.2 |
0.2 |
0.0 |
0.0 |
4 |
0.2 |
0.25 |
-0.223 |
0.011 |
5 |
0.2 |
0.35 |
-0.560 |
0.084 |
Sum |
1.0 |
1.0 |
|
0.260 |
- WOE of (e.g.) MISSING: WOEMISSING=ln(0.1/0.05)=0.693
- IV of (e.g.) MISSING: IVMISSING=(0.1−0.05)∗0.693=0.035
- Total IV: 0.035+0.110+0.020+0.0+0.011+0.084=0.260
Observations
- if %pos > %neg, WOE is positive
- if %pos < %neg, WOE is negative
- if %pos = %neg, WOE is 0
- IV is always positive