机器学习scikit-learn

1.分类:监督学习,非监督学习,半监督学习(少量标签),强化学习,遗传算法
2.安装:pip install scikit-learn 建议直接用anaconda(两个不用同事安装使用容易出错)
**安装问题:如果不用anaconda用pip建议看下这篇文章https://bbs.csdn.net/topics/391850435
3.sklearn包基本包括四大块,分类回归对应监督学习,聚类对应非监督学习和维度降低
在这里插入图片描述
在这里插入图片描述
4.机器学习步骤:数据(加载或者生产),处理数据(分train和test,交叉,归一。。),模型(处理及验证)

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

iris=datasets.load_iris()
iris_X=iris.data #iris数据
iris_y=iris.target#iris种类

# print(iris_X[:2,:])
# print(iris_y)

X_train,X_test,y_train,y_test=train_test_split(iris_X,iris_y,test_size=0.3)
knn=KNeighborsClassifier()
knn.fit(X_train,y_train)
print(knn.predict(X_test))
print(y_test)

在这里插入图片描述

5.模型参数(fit,predict)
coef_ , intercept_ (y=0.1x+0.3 coef_表示的就是x参数0.1,intercept_表示0.3 )
get_params() 这是拿出本模型里设置的默认参数
score()打分,回归和分类打分规则不一样

6.normalization标准化数据(处理后数据更适合机器学习)
第一步:需要先导入 from sklearn import preprocessing
第二步:scale正态化处理 preprocessing.scale(*)

7.训练数据,测试数据,评价模型,f1,R2,过拟合,

8,交叉验证(scoring:回归用mean_squared_error 然后这个一般生产的是负值所以前面加个负号,分类用accuracy)
首先引入 from sklearn.cross_validation import cross_val_score

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cross_validation import cross_val_score

iris=datasets.load_iris()
iris_X=iris.data #iris数据
iris_y=iris.target#iris种类
knn=KNeighborsClassifier()
scores=cross_val_score(knn,iris_X,iris_y,cv=5,scoring="accuracy")#5表示交叉5组
print(scores)
print(scores.mean())

测试k值:

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cross_validation import cross_val_score

iris=datasets.load_iris()
iris_X=iris.data #iris数据
iris_y=iris.target#iris种类
k_range=range(1,31)
k_scores=[]
for k in k_range:
    knn=KNeighborsClassifier(n_neighbors=k)
    scores=cross_val_score(knn,iris_X,iris_y,cv=5,scoring="accuracy")#5表示交叉5组
    k_scores.append(scores.mean())

9.可视化学习过程检验学习好坏
首先引入 from sklearn.model_selection import learning_curve

from sklearn.model_selection import learning_curve
import numpy as np
from sklearn import datasets
from sklearn.svm import SVC
import matplotlib.pyplot as plt

digits=datasets.load_digits()
X=digits.data
y=digits.target
train_sizes,train_loss,test_loss=learning_curve(SVC(gamma=0.001),X,y,cv=10,scoring="neg_mean_squared_error",train_sizes=[0.1,0.25,0.5,0.75,1])
train_loss_mean=-np.mean(train_loss,axis=1)
test_loss_mean=-np.mean(test_loss,axis=1)

plt.plot(train_sizes,train_loss_mean,"ro-",label="training")
plt.plot(train_sizes,test_loss_mean,"bo-",label="testing")
plt.legend(loc="best")#表示图例右上角那个放的位置
plt.show()

在这里插入图片描述
10.与9类似,这里是检验gamma值大小对过拟合的影响
与9代码也类似把learning_curve改为validation_curve

from sklearn.model_selection import validation_curve
import numpy as np
from sklearn import datasets
from sklearn.svm import SVC
import matplotlib.pyplot as plt

digits=datasets.load_digits()
X=digits.data
y=digits.target
param_range=np.logspace(-6,-2.3,5)#改变gamma的值
train_loss,test_loss=validation_curve(SVC(),X,y,param_name="gamma",param_range=param_range,cv=10,scoring="neg_mean_squared_error") #X,y放在svc后,放后面程序出错
train_loss_mean=-np.mean(train_loss,axis=1)
test_loss_mean=-np.mean(test_loss,axis=1)

plt.plot(param_range,train_loss_mean,"ro-",label="training")
plt.plot(param_range,test_loss_mean,"bo-",label="testing")
plt.legend(loc="best")#表示图例右上角那个放的位置
plt.show()

在这里插入图片描述
10,保存训练的模型
在这里插入图片描述
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/weixin_42357472/article/details/84237493