数据预处理——学习器流水线

模型原型

class sklearn.pipeline.Pipeline(steps)
参数

  • steps:列表的元素为(name,transform)元组,其中name是学习器的名字,用于输出和日志;transform是学习器,必须提供transform方法

属性

  • named_steps

方法

  • fit(X[,y]):启动流水线
  • transform(X):启动流水线
  • fit_transform(X[,y])
  • get_support([indices])
  • inverse_transform(X)
  • predict(X)/predict_log_proba(X)/predict_proba(X):将X进行特征选择之后,在使用内部的estimator来预测
  • score(X,y):将X进行特征选择之后,在使用内部的estimator来评分

示例(代码有问题,待处理)

from sklearn.svm import LinearSVC
from sklearn.datasets import load_digits
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

def test_Pipeline(data):
    X_train,X_test,y_train,y_test=data
    steps= [('Linear_SVM',LinearSVC(C=1.0,penalty='l1',dual=False)), ('LogisticRegression',LogisticRegression(C=1))]
    pipeline=Pipeline(steps)
    pipeline.fit(X_train,y_train)
    print('Named steps:',pipeline.named_steps)
    print('Pipeline Score:',pipeline.score(X_test,y_test))

if __name__=='__main__':
    data=load_digits()
    X=data.data
    y=data.target
    test_Pipeline(cross_validation.train_test_split(X,y,test_size=0.25,random_state=0,stratify=y))

猜你喜欢

转载自blog.csdn.net/weixin_39777626/article/details/79936260