sklearn manual (continuously updated ing...)

The gods are silent-personal CSDN blog post directory

This article is a handbook that you can refer to anytime you use sklearn. I have the basis of using sklearn, and I am going to use the official sklearn tutorial document to refer to the article sklearn user tutorial for systematically learning how to use the sklearn package, so this article will not introduce the basics.

sklearn官网:scikit-learn: machine learning in Python — scikit-learn 0.16.1 documentation

Last update time: 2023.3.27
Earliest update time: 2023.3.6

1. Classification

1.1 KNN

from sklearn.neighbors import KNeighborsClassifier

neigh=KNeighborsClassifier()
neigh.fit(x,y)

#测试
result=neigh.predict(test_x)

KNeighborsClassifier入参:

  • n_neighbors:KNN的N

1.2 SVM

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])
y=np.array([0,1,1,2,2,2])

clf=make_pipeline(StandardScaler(),SVC())
clf.fit(X,y)

predict_result=clf.predict(np.array([[2, 3], [2, 5], [5, 5]]))
print(predict_result)

2. Clustering

2.1 KMeans

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

from sklearn.cluster import KMeans

kmeans=KMeans()    #新建KMeans对象,并传入参数
kmeans.fit(case_feature)  #进行训练

kmeans.labels_  #K均值对象的所有训练集的标签

KMeans()The input parameters of the method:

  • n_cluster: number of clusters
  • init: The method to initialize the centroid. Can take a string (algorithm to initialize) or a matrix (centroid to initialize) as input.
  • n_init: How many different random seeds to rerun the K-means clustering algorithm.
    It should be noted that if n_cluster and n_init are used together, n_cluster will override n_init, that is, mandatory n_init=, K-means clustering will only run once (K-means, if the centroid is determined, then the latter will be fixed... This is the principle part (this article will not talk about it) and
    this warning will be reported:
    insert image description here
    (Reference materials for this knowledge point: python - k-means with selected initial centers - Stack Overflow

3. Indicators

3.1 Common classification indicators

from sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score

#y是标签,result是预测结果(经predict之后的数值输出)
#值得一提的是,单标签和多标签都能实现。单标签就各自只要一列数据就行
print(accuracy_score(y,result))
print(precision_score(y,result,average='macro'))
print(recall_score(y,result,average='macro'))
print(f1_score(y,result,average='macro'))

4. Anomaly detection

4.1 local outlier factor (LOF)

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LocalOutlierFactor.html

Principle Explanation: Machine Learning - Anomaly Detection Algorithm (2): Local Outlier Factor - Zhihu Generally speaking,
the point that is very distant from other points is regarded as an outlier point.

Guess you like

Origin blog.csdn.net/PolarisRisingWar/article/details/129364193