sklearn/scikit-learn孤立森林(IsolationForest)中decision_function和score_samples函数的区别和联系

目的

本文是为了说明decision_function和score_samples的区别和作用。下边结合例子加以说明

例子

# -*- coding: utf-8 -*-
# Author: qianhangbaihang
# Time  : 2020/12/24 上午11:57
# IDE   : PyCharm
import numpy as np
from sklearn.ensemble import IsolationForest

train_data = np.random.standard_normal((300, 5))
test_data = np.random.standard_normal((50, 5)) + 40

rng1 = np.random.RandomState(55)  # 确保孤立森林结果可复现
outliers_fraction1 = 0.01
clf1 = IsolationForest(
    contamination=outliers_fraction1,
    behaviour='new',
    random_state=rng1,
    n_jobs=-1
)
clf1.fit(train_data)
decision_function1 = clf1.decision_function(test_data)
score_samples1 = clf1.score_samples(test_data)

rng2 = np.random.RandomState(55)  # 确保孤立森林结果可复现
outliers_fraction2 = 0.2
clf2 = IsolationForest(
    contamination=outliers_fraction2,
    behaviour='new',
    random_state=rng2,
    n_jobs=-1
)
clf2.fit(train_data)
decision_function2 = clf2.decision_function(test_data)
score_samples2 = clf2.score_samples(test_data)
>>> np.all(decision_function1==decision_function2)
False
>>> np.all(score_samples1==score_samples2)
True

区别

从例子中可以看出,采用不同的contamination(代码14行和26行),得到的decision_function的值是不同的。但是,得到的score_samples的值是相同的。

联系

decision_function = score_samples - offset_
offset_与contamination的设置有关1,具体为offset_ = np.percentile(score_samples(X), 100. * contamination)2

Reference:

sklearn官方:decision_function
sklearn官方:score_samples


  1. sklearn.ensemble.IsolationForest→Attributes→offset_ ↩︎

  2. 查看文件_iforest.py里边的源代码,搜索self.offset_即可找到 ↩︎

猜你喜欢

转载自blog.csdn.net/shiyuzuxiaqianli/article/details/111632533
今日推荐