文章目录
目的
本文是为了说明decision_function和score_samples的区别和作用。下边结合例子加以说明
例子
# -*- coding: utf-8 -*-
# Author: qianhangbaihang
# Time : 2020/12/24 上午11:57
# IDE : PyCharm
import numpy as np
from sklearn.ensemble import IsolationForest
train_data = np.random.standard_normal((300, 5))
test_data = np.random.standard_normal((50, 5)) + 40
rng1 = np.random.RandomState(55) # 确保孤立森林结果可复现
outliers_fraction1 = 0.01
clf1 = IsolationForest(
contamination=outliers_fraction1,
behaviour='new',
random_state=rng1,
n_jobs=-1
)
clf1.fit(train_data)
decision_function1 = clf1.decision_function(test_data)
score_samples1 = clf1.score_samples(test_data)
rng2 = np.random.RandomState(55) # 确保孤立森林结果可复现
outliers_fraction2 = 0.2
clf2 = IsolationForest(
contamination=outliers_fraction2,
behaviour='new',
random_state=rng2,
n_jobs=-1
)
clf2.fit(train_data)
decision_function2 = clf2.decision_function(test_data)
score_samples2 = clf2.score_samples(test_data)
>>> np.all(decision_function1==decision_function2)
False
>>> np.all(score_samples1==score_samples2)
True
区别
从例子中可以看出,采用不同的contamination
(代码14
行和26
行),得到的decision_function的值是不同的。但是,得到的score_samples的值是相同的。
联系
decision_function = score_samples - offset_
offset_与contamination
的设置有关1,具体为offset_ = np.percentile(score_samples(X), 100. * contamination)
。2
Reference:
sklearn官方:decision_function
sklearn官方:score_samples
查看文件_iforest.py里边的源代码,搜索self.offset_即可找到 ↩︎