ML之LoR:基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解
目录
基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解
# T2、手动分箱—利用自定义breaks_list参数即可
基于信用卡数据集利用LoR逻辑回归算法实现如何开发通用信用风险评分卡模型之全流程讲解
# 1、定义数据集
# 加载德国信用卡数据集,将由一组属性描述的债务人分类为良好或不良信用风险的信用数据。
数据集:UCI Machine Learning Repository: Data Set
# 1.1、查看部分数据
status.of.existing.checking.account | duration.in.month | credit.history | purpose | credit.amount | savings.account.and.bonds | present.employment.since | installment.rate.in.percentage.of.disposable.income | personal.status.and.sex | other.debtors.or.guarantors | present.residence.since | property | age.in.years | other.installment.plans | housing | number.of.existing.credits.at.this.bank | job | number.of.people.being.liable.to.provide.maintenance.for | telephone | foreign.worker | creditability | |
0 | ... < 0 DM | 6 | critical account/ other credits existing (not at this bank) | radio/television | 1169 | unknown/ no savings account | ... >= 7 years | 4 | male : divorced/separated | none | 4 | real estate | 67 | none | own | 2 | skilled employee / official | 1 | yes, registered under the customers name | yes | good |
1 | 0 <= ... < 200 DM | 48 | existing credits paid back duly till now | radio/television | 5951 | ... < 100 DM | 1 <= ... < 4 years | 2 | male : divorced/separated | none | 2 | real estate | 22 | none | own | 1 | skilled employee / official | 1 | none | yes | bad |
2 | no checking account | 12 | critical account/ other credits existing (not at this bank) | education | 2096 | ... < 100 DM | 4 <= ... < 7 years | 2 | male : divorced/separated | none | 3 | real estate | 49 | none | own | 1 | unskilled - resident | 2 | none | yes | good |
3 | ... < 0 DM | 42 | existing credits paid back duly till now | furniture/equipment | 7882 | ... < 100 DM | 4 <= ... < 7 years | 2 | male : divorced/separated | guarantor | 4 | building society savings agreement/ life insurance | 45 | none | for free | 1 | skilled employee / official | 2 | none | yes | good |
4 | ... < 0 DM | 24 | delay in paying off in the past | car (new) | 4870 | ... < 100 DM | 1 <= ... < 4 years | 3 | male : divorced/separated | none | 4 | unknown / no property | 53 | none | for free | 2 | skilled employee / official | 2 | none | yes | bad |
5 | no checking account | 36 | existing credits paid back duly till now | education | 9055 | unknown/ no savings account | 1 <= ... < 4 years | 2 | male : divorced/separated | none | 4 | unknown / no property | 35 | none | for free | 1 | unskilled - resident | 2 | yes, registered under the customers name | yes | good |
6 | no checking account | 24 | existing credits paid back duly till now | furniture/equipment | 2835 | 500 <= ... < 1000 DM | ... >= 7 years | 3 | male : divorced/separated | none | 4 | building society savings agreement/ life insurance | 53 | none | own | 1 | skilled employee / official | 1 | none | yes | good |
7 | 0 <= ... < 200 DM | 36 | existing credits paid back duly till now | car (used) | 6948 | ... < 100 DM | 1 <= ... < 4 years | 2 | male : divorced/separated | none | 2 | car or other, not in attribute Savings account/bonds | 35 | none | rent | 1 | management/ self-employed/ highly qualified employee/ officer | 1 | yes, registered under the customers name | yes | good |
8 | no checking account | 12 | existing credits paid back duly till now | radio/television | 3059 | ... >= 1000 DM | 4 <= ... < 7 years | 2 | male : divorced/separated | none | 4 | real estate | 61 | none | own | 1 | unskilled - resident | 1 | none | yes | good |
9 | 0 <= ... < 200 DM | 30 | critical account/ other credits existing (not at this bank) | car (new) | 5234 | ... < 100 DM | unemployed | 4 | male : divorced/separated | none | 2 | car or other, not in attribute Savings account/bonds | 28 | none | own | 2 | management/ self-employed/ highly qualified employee/ officer | 1 | none | yes | bad |
10 | 0 <= ... < 200 DM | 12 | existing credits paid back duly till now | car (new) | 1295 | ... < 100 DM | ... < 1 year | 3 | male : divorced/separated | none | 1 | car or other, not in attribute Savings account/bonds | 25 | none | rent | 1 | skilled employee / official | 1 | none | yes | bad |
11 | ... < 0 DM | 48 | existing credits paid back duly till now | business | 4308 | ... < 100 DM | ... < 1 year | 3 | male : divorced/separated | none | 4 | building society savings agreement/ life insurance | 24 | none | rent | 1 | skilled employee / official | 1 | none | yes | bad |
12 | 0 <= ... < 200 DM | 12 | existing credits paid back duly till now | radio/television | 1567 | ... < 100 DM | 1 <= ... < 4 years | 1 | male : divorced/separated | none | 1 | car or other, not in attribute Savings account/bonds | 22 | none | own | 1 | skilled employee / official | 1 | yes, registered under the customers name | yes | good |
13 | ... < 0 DM | 24 | critical account/ other credits existing (not at this bank) | car (new) | 1199 | ... < 100 DM | ... >= 7 years | 4 | male : divorced/separated | none | 4 | car or other, not in attribute Savings account/bonds | 60 | none | own | 2 | unskilled - resident | 1 | none | yes | bad |
14 | ... < 0 DM | 15 | existing credits paid back duly till now | car (new) | 1403 | ... < 100 DM | 1 <= ... < 4 years | 2 | male : divorced/separated | none | 4 | car or other, not in attribute Savings account/bonds | 28 | none | rent | 1 | skilled employee / official | 1 | none | yes | good |
15 | ... < 0 DM | 24 | existing credits paid back duly till now | radio/television | 1282 | 100 <= ... < 500 DM | 1 <= ... < 4 years | 4 | male : divorced/separated | none | 2 | car or other, not in attribute Savings account/bonds | 32 | none | own | 1 | unskilled - resident | 1 | none | yes | bad |
16 | no checking account | 24 | critical account/ other credits existing (not at this bank) | radio/television | 2424 | unknown/ no savings account | ... >= 7 years | 4 | male : divorced/separated | none | 4 | building society savings agreement/ life insurance | 53 | none | own | 2 | skilled employee / official | 1 | none | yes | good |
17 | ... < 0 DM | 30 | no credits taken/ all credits paid back duly | business | 8072 | unknown/ no savings account | ... < 1 year | 2 | male : divorced/separated | none | 3 | car or other, not in attribute Savings account/bonds | 25 | bank | own | 3 | skilled employee / official | 1 | none | yes | good |
18 | 0 <= ... < 200 DM | 24 | existing credits paid back duly till now | car (used) | 12579 | ... < 100 DM | ... >= 7 years | 4 | male : divorced/separated | none | 2 | unknown / no property | 44 | none | for free | 1 | management/ self-employed/ highly qualified employee/ officer | 1 | yes, registered under the customers name | yes | bad |
19 | no checking account | 24 | existing credits paid back duly till now | radio/television | 3430 | 500 <= ... < 1000 DM | ... >= 7 years | 3 | male : divorced/separated | none | 2 | car or other, not in attribute Savings account/bonds | 31 | none | own | 1 | skilled employee / official | 2 | yes, registered under the customers name | yes | good |
# 1.2、统计所有变量类型、个数等信息
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 status.of.existing.checking.account 1000 non-null category
1 duration.in.month 1000 non-null int64
2 credit.history 1000 non-null category
3 purpose 1000 non-null object
4 credit.amount 1000 non-null int64
5 savings.account.and.bonds 1000 non-null category
6 present.employment.since 1000 non-null category
7 installment.rate.in.percentage.of.disposable.income 1000 non-null int64
8 personal.status.and.sex 1000 non-null category
9 other.debtors.or.guarantors 1000 non-null category
10 present.residence.since 1000 non-null int64
11 property 1000 non-null category
12 age.in.years 1000 non-null int64
13 other.installment.plans 1000 non-null category
14 housing 1000 non-null category
15 number.of.existing.credits.at.this.bank 1000 non-null int64
16 job 1000 non-null category
17 number.of.people.being.liable.to.provide.maintenance.for 1000 non-null int64
18 telephone 1000 non-null category
19 foreign.worker 1000 non-null category
20 creditability 1000 non-null object
dtypes: category(12), int64(7), object(2)
memory usage: 84.0+ KB
# 2、数据预处理
# 2.1、变量筛选
#利用var_filter函数根据变量的缺失率、IV值、等价值率等因素进行筛选,并指定目标变量y
var_filter(dt, y, x=None, iv_limit=0.02, missing_limit=0.95,
identical_limit=0.95, var_rm=None, var_kp=None,
return_rm_reason=False, positive='bad|1')
'''
函数功能:即当某个变量的 IV 值iv_limit小于0.02,或缺失率missing_limit大于95%,或同值率(除空值外)identical_limit大于95%,则剔除掉该变量。
体参数如下:可跳到该函数查询
varrm:可设置强制保留的变量,默认为空;
varkp:可设置强制剔除的变量,默认为空;
return_rm_reason:可设置是否返回剔除原因,默认为不返回(False);
positive:可设置坏样本对应的值,默认为“bad|1”。
'''
age.in.years | other.debtors.or.guarantors | savings.account.and.bonds | credit.amount | installment.rate.in.percentage.of.disposable.income | status.of.existing.checking.account | credit.history | present.employment.since | purpose | housing | property | other.installment.plans | duration.in.month | creditability | |
0 | 67 | none | unknown/ no savings account | 1169 | 4 | ... < 0 DM | critical account/ other credits existing (not at this bank) | ... >= 7 years | radio/television | own | real estate | none | 6 | 0 |
1 | 22 | none | ... < 100 DM | 5951 | 2 | 0 <= ... < 200 DM | existing credits paid back duly till now | 1 <= ... < 4 years | radio/television | own | real estate | none | 48 | 1 |
2 | 49 | none | ... < 100 DM | 2096 | 2 | no checking account | critical account/ other credits existing (not at this bank) | 4 <= ... < 7 years | education | own | real estate | none | 12 | 0 |
3 | 45 | guarantor | ... < 100 DM | 7882 | 2 | ... < 0 DM | existing credits paid back duly till now | 4 <= ... < 7 years | furniture/equipment | for free | building society savings agreement/ life insurance | none | 42 | 0 |
4 | 53 | none | ... < 100 DM | 4870 | 3 | ... < 0 DM | delay in paying off in the past | 1 <= ... < 4 years | car (new) | for free | unknown / no property | none | 24 | 1 |
5 | 35 | none | unknown/ no savings account | 9055 | 2 | no checking account | existing credits paid back duly till now | 1 <= ... < 4 years | education | for free | unknown / no property | none | 36 | 0 |
6 | 53 | none | 500 <= ... < 1000 DM | 2835 | 3 | no checking account | existing credits paid back duly till now | ... >= 7 years | furniture/equipment | own | building society savings agreement/ life insurance | none | 24 | 0 |
7 | 35 | none | ... < 100 DM | 6948 | 2 | 0 <= ... < 200 DM | existing credits paid back duly till now | 1 <= ... < 4 years | car (used) | rent | car or other, not in attribute Savings account/bonds | none | 36 | 0 |
8 | 61 | none | ... >= 1000 DM | 3059 | 2 | no checking account | existing credits paid back duly till now | 4 <= ... < 7 years | radio/television | own | real estate | none | 12 | 0 |
9 | 28 | none | ... < 100 DM | 5234 | 4 | 0 <= ... < 200 DM | critical account/ other credits existing (not at this bank) | unemployed | car (new) | own | car or other, not in attribute Savings account/bonds | none | 30 | 1 |
10 | 25 | none | ... < 100 DM | 1295 | 3 | 0 <= ... < 200 DM | existing credits paid back duly till now | ... < 1 year | car (new) | rent | car or other, not in attribute Savings account/bonds | none | 12 | 1 |
11 | 24 | none | ... < 100 DM | 4308 | 3 | ... < 0 DM | existing credits paid back duly till now | ... < 1 year | business | rent | building society savings agreement/ life insurance | none | 48 | 1 |
12 | 22 | none | ... < 100 DM | 1567 | 1 | 0 <= ... < 200 DM | existing credits paid back duly till now | 1 <= ... < 4 years | radio/television | own | car or other, not in attribute Savings account/bonds | none | 12 | 0 |
13 | 60 | none | ... < 100 DM | 1199 | 4 | ... < 0 DM | critical account/ other credits existing (not at this bank) | ... >= 7 years | car (new) | own | car or other, not in attribute Savings account/bonds | none | 24 | 1 |
14 | 28 | none | ... < 100 DM | 1403 | 2 | ... < 0 DM | existing credits paid back duly till now | 1 <= ... < 4 years | car (new) | rent | car or other, not in attribute Savings account/bonds | none | 15 | 0 |
15 | 32 | none | 100 <= ... < 500 DM | 1282 | 4 | ... < 0 DM | existing credits paid back duly till now | 1 <= ... < 4 years | radio/television | own | car or other, not in attribute Savings account/bonds | none | 24 | 1 |
16 | 53 | none | unknown/ no savings account | 2424 | 4 | no checking account | critical account/ other credits existing (not at this bank) | ... >= 7 years | radio/television | own | building society savings agreement/ life insurance | none | 24 | 0 |
17 | 25 | none | unknown/ no savings account | 8072 | 2 | ... < 0 DM | no credits taken/ all credits paid back duly | ... < 1 year | business | own | car or other, not in attribute Savings account/bonds | bank | 30 | 0 |
18 | 44 | none | ... < 100 DM | 12579 | 4 | 0 <= ... < 200 DM | existing credits paid back duly till now | ... >= 7 years | car (used) | for free | unknown / no property | none | 24 | 1 |
19 | 31 | none | 500 <= ... < 1000 DM | 3430 | 3 | no checking account | existing credits paid back duly till now | ... >= 7 years | radio/television | own | car or other, not in attribute Savings account/bonds | none | 24 | 0 |
# 2.2、分析Woe变量分箱
# T1、自动分箱—利用woebin()函数
woebin(dt, y, x=None,
var_skip=None, breaks_list=None, special_values=None,
stop_limit=0.1, count_distr_limit=0.05, bin_num_limit=8,
# min_perc_fine_bin=0.02, min_perc_coarse_bin=0.05, max_num_bin=8,
positive="bad|1", no_cores=None, print_step=0, method="tree",
ignore_const_cols=True, ignore_datetime_cols=True,
check_cate_num=True, replace_blank=True,
save_breaks_list=None, **kwargs)
'''
函数功能:可针对数值型和类别型变量生成最优分箱结果,method="tree/chimerge"方法可选择决策树分箱/卡方分箱。
具体参数如下:可跳到该函数查询
var_skip: 设置需要跳过分箱操作的变量;
breaks_list: 切分点列表,默认为空。如果非空,则按设置的切分点进行分箱处理;
special_values: 设置需要单独分箱的值,默认为空;
count_distr_limit: 设置分箱占比的最小值,一般可接受范围为0.01-0.2,默认值为0.05;
stop_limit: 当IV值的增长率小于所设置的stop_limit,或卡方值小于qchisq(1-stoplimit, 1)时,停止分箱。一般可接受范围为0-0.5,默认值为0.1;
bin_num_limit: 该参数为整数,代表最大分箱数。
positive: 指定样本中正样本对应的标签,默认为"bad|1";
no_cores: 设置用于并行计算的 CPU 数目;
print_step: 该参数为非负数,默认值为1。若print_step>0,每次迭代会输出变量名。若iteration=0或no_cores>1,不会输出任何信息;
method: 设置分箱方法,可设置"tree"(决策树)或"chimerge"(卡方),默认值为"tree";
ignore_const_cols: 是否忽略常数列,默认值为True,即忽略常数列;
ignore_datetime_cols: 是否忽略日期列,默认值为True,即忽略日期列;
check_cate_num: 检查类别变量中枚举值数目是否大于50,默认值为True,即自动进行检查。若枚举值过多,会影响分箱过程的速度;
replace_blank: 设置是否将空值填为None,默认为True。
'''
data_df_woebin['age.in.years']
variable | bin | count | count_distr | good | bad | badprob | woe | bin_iv | total_iv | breaks | is_special_values | |
0 | age.in.years | [-inf,26.0) | 190 | 0.19 | 110 | 80 | 0.421052632 | 0.528844129 | 0.057921024 | 0.130498542 | 26 | FALSE |
1 | age.in.years | [26.0,28.0) | 101 | 0.101 | 74 | 27 | 0.267326733 | -0.160930367 | 0.002528906 | 0.130498542 | 28 | FALSE |
2 | age.in.years | [28.0,35.0) | 257 | 0.257 | 172 | 85 | 0.3307393 | 0.14245464 | 0.005359008 | 0.130498542 | 35 | FALSE |
3 | age.in.years | [35.0,37.0) | 79 | 0.079 | 67 | 12 | 0.151898734 | -0.872488109 | 0.048610052 | 0.130498542 | 37 | FALSE |
4 | age.in.years | [37.0,inf) | 373 | 0.373 | 277 | 96 | 0.257372654 | -0.212371454 | 0.016079553 | 0.130498542 | inf | FALSE |
# T2、手动分箱—利用自定义breaks_list参数即可
data_df_woebin_DIY['age.in.years']
variable | bin | count | count_distr | good | bad | badprob | woe | bin_iv | total_iv | breaks | is_special_values | |
0 | age.in.years | [-inf,25.0) | 149 | 0.149 | 88 | 61 | 0.409395973 | 0.48083491 | 0.037321948 | 0.086291678 | 25 | FALSE |
1 | age.in.years | [25.0,35.0) | 399 | 0.399 | 268 | 131 | 0.328320802 | 0.131508203 | 0.007076394 | 0.086291678 | 35 | FALSE |
2 | age.in.years | [35.0,45.0) | 251 | 0.251 | 193 | 58 | 0.231075697 | -0.354949318 | 0.029241063 | 0.086291678 | 45 | FALSE |
3 | age.in.years | [45.0,inf) | 201 | 0.201 | 151 | 50 | 0.248756219 | -0.257958971 | 0.012652273 | 0.086291678 | inf | FALSE |
# 2.3、分析变量分箱后可视化—观察是否存在单调性
对各变量分箱的count distribution和bad probability进行可视化
# 2.4、对变量执行woe分箱变换
creditability | savings.account.and.bonds_woe | housing_woe | age.in.years_woe | other.debtors.or.guarantors_woe | purpose_woe | credit.amount_woe | credit.history_woe | installment.rate.in.percentage.of.disposable.income_woe | other.installment.plans_woe | present.employment.since_woe | property_woe | status.of.existing.checking.account_woe | duration.in.month_woe | |
0 | 0 | -0.762140052 | -0.194156014 | -0.257958971 | -0.000525072 | -0.410062817 | 0.033661283 | -0.733740578 | 0.157300289 | -0.121178625 | -0.235566071 | -0.461034959 | 0.614203978 | -1.312186389 |
1 | 1 | 0.271357844 | -0.194156014 | 0.48083491 | -0.000525072 | -0.410062817 | 0.390539458 | 0.088318617 | -0.190472769 | -0.121178625 | 0.032103245 | -0.461034959 | 0.614203978 | 1.134979933 |
2 | 0 | 0.271357844 | -0.194156014 | -0.257958971 | -0.000525072 | 0.279920067 | -0.258307464 | -0.733740578 | -0.190472769 | -0.121178625 | -0.394415272 | -0.461034959 | -1.176263223 | -0.346624608 |
3 | 0 | 0.271357844 | 0.472604411 | -0.257958971 | 0.005115101 | 0.279920067 | 0.390539458 | 0.088318617 | -0.190472769 | -0.121178625 | -0.394415272 | 0.028573372 | 0.614203978 | 0.524524468 |
4 | 1 | 0.271357844 | 0.472604411 | -0.257958971 | -0.000525072 | 0.279920067 | 0.390539458 | 0.085157808 | -0.064538521 | -0.121178625 | 0.032103245 | 0.586082361 | 0.614203978 | 0.108688306 |
5 | 0 | -0.762140052 | 0.472604411 | -0.354949318 | -0.000525072 | 0.279920067 | 0.390539458 | 0.088318617 | -0.190472769 | -0.121178625 | 0.032103245 | 0.586082361 | -1.176263223 | 0.524524468 |
6 | 0 | -0.762140052 | -0.194156014 | -0.257958971 | -0.000525072 | 0.279920067 | -0.258307464 | 0.088318617 | -0.064538521 | -0.121178625 | -0.235566071 | 0.028573372 | -1.176263223 | 0.108688306 |
7 | 0 | 0.271357844 | 0.40444522 | -0.354949318 | -0.000525072 | -0.805625164 | 0.390539458 | 0.088318617 | -0.190472769 | -0.121178625 | 0.032103245 | 0.034191365 | 0.614203978 | 0.524524468 |
8 | 0 | -0.762140052 | -0.194156014 | -0.257958971 | -0.000525072 | -0.410062817 | -0.258307464 | 0.088318617 | -0.190472769 | -0.121178625 | -0.394415272 | -0.461034959 | -1.176263223 | -0.346624608 |
9 | 1 | 0.271357844 | -0.194156014 | 0.131508203 | -0.000525072 | 0.279920067 | 0.390539458 | -0.733740578 | 0.157300289 | -0.121178625 | 0.431137463 | 0.034191365 | 0.614203978 | 0.108688306 |
10 | 1 | 0.271357844 | 0.40444522 | 0.131508203 | -0.000525072 | 0.279920067 | 0.033661283 | 0.088318617 | -0.064538521 | -0.121178625 | 0.431137463 | 0.034191365 | 0.614203978 | -0.346624608 |
11 | 1 | 0.271357844 | 0.40444522 | 0.48083491 | -0.000525072 | 0.279920067 | 0.390539458 | 0.088318617 | -0.064538521 | -0.121178625 | 0.431137463 | 0.028573372 | 0.614203978 | 1.134979933 |
12 | 0 | 0.271357844 | -0.194156014 | 0.48083491 | -0.000525072 | -0.410062817 | -0.7282385 | 0.088318617 | -0.190472769 | -0.121178625 | 0.032103245 | 0.034191365 | 0.614203978 | -0.346624608 |
13 | 1 | 0.271357844 | -0.194156014 | -0.257958971 | -0.000525072 | 0.279920067 | 0.033661283 | -0.733740578 | 0.157300289 | -0.121178625 | -0.235566071 | 0.034191365 | 0.614203978 | 0.108688306 |
14 | 0 | 0.271357844 | 0.40444522 | 0.131508203 | -0.000525072 | 0.279920067 | -0.7282385 | 0.088318617 | -0.190472769 | -0.121178625 | 0.032103245 | 0.034191365 | 0.614203978 | -0.346624608 |
15 | 1 | 0.13955188 | -0.194156014 | 0.131508203 | -0.000525072 | -0.410062817 | 0.033661283 | 0.088318617 | 0.157300289 | -0.121178625 | 0.032103245 | 0.034191365 | 0.614203978 | 0.108688306 |
16 | 0 | -0.762140052 | -0.194156014 | -0.257958971 | -0.000525072 | -0.410062817 | -0.258307464 | -0.733740578 | 0.157300289 | -0.121178625 | -0.235566071 | 0.028573372 | -1.176263223 | 0.108688306 |
17 | 0 | -0.762140052 | -0.194156014 | 0.131508203 | -0.000525072 | 0.279920067 | 0.390539458 | 1.234070835 | -0.190472769 | 0.477550835 | 0.431137463 | 0.034191365 | 0.614203978 | 0.108688306 |
18 | 1 | 0.271357844 | 0.472604411 | -0.354949318 | -0.000525072 | -0.805625164 | 1.170071253 | 0.088318617 | 0.157300289 | -0.121178625 | -0.235566071 | 0.586082361 | 0.614203978 | 0.108688306 |
19 | 0 | -0.762140052 | -0.194156014 | 0.131508203 | -0.000525072 | -0.410062817 | -0.258307464 | 0.088318617 | -0.064538521 | -0.121178625 | -0.235566071 | 0.034191365 | -1.176263223 | 0.108688306 |
# 3、模型训练
# 3.1、切分数据集
train2woe输出如下所示
age.in.years_woe | credit.amount_woe | credit.history_woe | creditability | duration.in.month_woe | housing_woe | installment.rate.in.percentage.of.disposable.income_woe | other.debtors.or.guarantors_woe | other.installment.plans_woe | present.employment.since_woe | property_woe | purpose_woe | savings.account.and.bonds_woe | status.of.existing.checking.account_woe | |
0 | -0.257958971 | 0.033661283 | -0.733740578 | 0 | -1.312186389 | -0.194156014 | 0.157300289 | -0.000525072 | -0.121178625 | -0.235566071 | -0.461034959 | -0.410062817 | -0.762140052 | 0.614203978 |
1 | 0.48083491 | 0.390539458 | 0.088318617 | 1 | 1.134979933 | -0.194156014 | -0.190472769 | -0.000525072 | -0.121178625 | 0.032103245 | -0.461034959 | -0.410062817 | 0.271357844 | 0.614203978 |
2 | -0.257958971 | -0.258307464 | -0.733740578 | 0 | -0.346624608 | -0.194156014 | -0.190472769 | -0.000525072 | -0.121178625 | -0.394415272 | -0.461034959 | 0.279920067 | 0.271357844 | -1.176263223 |
6 | -0.257958971 | -0.258307464 | 0.088318617 | 0 | 0.108688306 | -0.194156014 | -0.064538521 | -0.000525072 | -0.121178625 | -0.235566071 | 0.028573372 | 0.279920067 | -0.762140052 | -1.176263223 |
7 | -0.354949318 | 0.390539458 | 0.088318617 | 0 | 0.524524468 | 0.40444522 | -0.190472769 | -0.000525072 | -0.121178625 | 0.032103245 | 0.034191365 | -0.805625164 | 0.271357844 | 0.614203978 |
8 | -0.257958971 | -0.258307464 | 0.088318617 | 0 | -0.346624608 | -0.194156014 | -0.190472769 | -0.000525072 | -0.121178625 | -0.394415272 | -0.461034959 | -0.410062817 | -0.762140052 | -1.176263223 |
11 | 0.48083491 | 0.390539458 | 0.088318617 | 1 | 1.134979933 | 0.40444522 | -0.064538521 | -0.000525072 | -0.121178625 | 0.431137463 | 0.028573372 | 0.279920067 | 0.271357844 | 0.614203978 |
13 | -0.257958971 | 0.033661283 | -0.733740578 | 1 | 0.108688306 | -0.194156014 | 0.157300289 | -0.000525072 | -0.121178625 | -0.235566071 | 0.034191365 | 0.279920067 | 0.271357844 | 0.614203978 |
16 | -0.257958971 | -0.258307464 | -0.733740578 | 0 | 0.108688306 | -0.194156014 | 0.157300289 | -0.000525072 | -0.121178625 | -0.235566071 | 0.028573372 | -0.410062817 | -0.762140052 | -1.176263223 |
18 | -0.354949318 | 1.170071253 | 0.088318617 | 1 | 0.108688306 | 0.472604411 | 0.157300289 | -0.000525072 | -0.121178625 | -0.235566071 | 0.586082361 | -0.805625164 | 0.271357844 | 0.614203978 |
19 | 0.131508203 | -0.258307464 | 0.088318617 | 0 | 0.108688306 | -0.194156014 | -0.064538521 | -0.000525072 | -0.121178625 | -0.235566071 | 0.034191365 | -0.410062817 | -0.762140052 | -1.176263223 |
# 3.2、划分自变量和因变量
# 3.3、模型建立、训练、预测:建立逻辑回归模型
coef_: [[0.34206044 0.78274222 0.57196834 0.89780668 0.67956772 1.06219811
0. 0.23090027 0.7965086 0.22792681 1.07066195 0.83836441
0.72843684]]
intercept_: [-0.83437247]
# 3.4、模型评估
利用perf_eva函数进行评估
perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"],
show_plot=True, positive="bad|1", seed=186)
'''
函数功能:KS、AUC、Lift曲线、PR曲线评估模型的效果。plot_type = ks、lift、roc、pr
perf_eva(label, pred, title=None, groupnum=None, plot_type=["ks", "roc"], show_plot=True, positive="bad|1", seed=186)
perf_eva()函数可以从
'''
# 4、模型上线并监控
# 4.1、模型推理—计算信用得分
利用scorecard函数概率进行映射,转换成评分卡得分。得分包括每个客户的最终得分和单个变量的得分
scorecard(bins, model, xcolumns, points0=600, odds0=1/19, pdo=50, basepoints_eq0=False)
'''
函数功能:概率进行映射,转换成评分卡得分
具体参数如下
bins:分箱信息。woebin()返回的结果。
model:模型对象。
points0:基础分,默认为600。 odds:好坏比,默认为1:19
pdo:比率翻番的倍数,默认为50。
basepoints_eq0:如果为True,则将基础分分散到每个变量中。
'''
print('card_dict_age.in.years \n',card_dict['age.in.years'])
print('card_dict_credit.amount \n',card_dict['credit.amount'])
print('card_dict_credit.historyt \n',card_dict['credit.history'])
print('card_dict_duration.in.month \n',card_dict['duration.in.month'])
print('card_dict_housing \n',card_dict['housing'])
card_dict_age.in.years
variable bin points
10 age.in.years [-inf,25.0) -12.0
11 age.in.years [25.0,35.0) -3.0
12 age.in.years [35.0,45.0) 9.0
13 age.in.years [45.0,inf) 6.0
card_dict_credit.amount
variable bin points
31 credit.amount [-inf,1400.0) -2.0
32 credit.amount [1400.0,1800.0) 41.0
33 credit.amount [1800.0,4000.0) 15.0
34 credit.amount [4000.0,9200.0) -22.0
35 credit.amount [9200.0,inf) -66.0
card_dict_credit.historyt
variable bin points
17 credit.history no credits taken/ all credits paid back duly%,... -51.0
18 credit.history existing credits paid back duly till now -4.0
19 credit.history delay in paying off in the past -4.0
20 credit.history critical account/ other credits existing (not ... 30.0
card_dict_duration.in.month
variable bin points
23 duration.in.month [-inf,8.0) 85.0
24 duration.in.month [8.0,16.0) 22.0
25 duration.in.month [16.0,34.0) -7.0
26 duration.in.month [34.0,44.0) -34.0
27 duration.in.month [44.0,inf) -74.0
card_dict_housing
variable bin points
42 housing rent -20.0
43 housing own 10.0
44 housing for free -23.0
# 4.2、线上模型评估—评分稳定性评估PSI
# 利用scorecard_ply()函数计算train和test数据集的信用分数
scorecard_ply(dt, card, only_total_score=True, print_step=0, replace_blank_na=True,
var_kp=None):
'''
函数功能:概率进行映射,分数转换,转换成评分卡得分,使用 `scorecard` 的结果计算信用评分。
dt:原始数据
card: 从`scorecard`生成的记分卡。
only_total_score:逻辑,默认为 TRUE。 如果为 TRUE,则输出仅包括总信用评分; 否则,如果为 FALSE,则输出包括总和每个变量的信用评分。
print_step:一个非负整数。 默认值为 1。如果 print_step>0,则在每次 print_step-th 迭代时打印变量名称。 如果 print_step=0,则不打印任何消息。
replace_blank_na:逻辑。 用 NA 替换空白值。 默认为真。 这个参数应该和woebin的一样。
var_kp:强制保留变量的名称,如id列。 默认为无。
'''