Easy-to-understand machine learning - covariance, singular value decomposition, PCA dimensionality reduction code implementation

This article has participated in the "New Talent Creation Ceremony" event

Principle analysis

In the previous blog about dimensionality reduction, we have discussed the principles and formula derivation of covariance and singular value decomposition method. Mathematical concepts and practices of dimensionality reduction) and easy-to-understand machine learning - the derivation and explanation of the mathematical principles of gradient ascent principal component analysis .

Data selection

For data selection, we use the iris data set and the Swiss roll data set to implement the code and display the effect

Why use these two datasets? Because iris is one of the most commonly used data sets for machine learning beginners, it is more familiar to most people, and his data is four-dimensional data, which meets the requirements for dimensionality reduction. However, since we cannot intuitively observe the distribution effect of four-dimensional data by drawing, we introduce the Swiss roll data set here.

iris data observation

Since the iris data cannot be intuitively observed by drawing, why is there still drawing observation? First of all, from a habit, we should first observe the data set, and then we can select a part of the data to roughly observe the distribution of the data

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus']=False  # 用来正常显示负号
iris = datasets.load_iris() #拿到莺尾花数据
X = iris.data[:,2:] #莺尾花的萼片长宽,花瓣长宽,这里只提取花瓣长宽数据
y = iris.target #莺尾花的类别

plt.scatter(X[y==0,0],X[y==0,1])
plt.scatter(X[y==1,0],X[y==1,1])
plt.scatter(X[y==2,0],X[y==2,1])
plt.xlabel('花瓣长')
plt.ylabel('花瓣宽')
plt.grid()
复制代码

insert image description here

Swiss Roll Data Observation

from sklearn.datasets import make_swiss_roll
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

X, t = make_swiss_roll(n_samples=2000, noise=0.1)
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=t, cmap=plt.cm.Spectral,edgecolors='black')
plt.show()
复制代码

insert image description here

From the image, we can observe that the dimensionality reduction of the Swiss roll cannot be done by just removing one coordinate axis, but should remove another coordinate axis after the rotation.

Code

singular value decomposition

x1 = X.dot(X.T)
eig, featueVector = np.linalg.eig(x1)
x1 = x1.dot(featueVector[:,1:])
plt.scatter(x1[:,0], x1[:,1], c=t)
plt.show()
复制代码

The effect of the iris dataset after dimensionality reduction

insert image description here

The effect of dimensionality reduction on the Swiss roll dataset

insert image description here

Covariance method

import matplotlib.pyplot as plt
import numpy as np

X1cov=np.cov(X.T)
eig, featueVector = np.linalg.eig(X1cov)
X1new = X.dot(featueVector[:,:2])
xm = X1new.max()
plt.scatter(X1new[:,0], X1new[:,1], c=t)
plt.show()
复制代码

The effect of the iris dataset after dimensionality reduction

insert image description here

The effect of the Swiss roll dataset after dimensionality reduction

insert image description here

PCA method during sklearn

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(X)
X1new = pca.transform(X)
plt.scatter(X1new[:,0], X1new[:,1], c=t)
plt.show()
复制代码

The effect of the iris dataset after dimensionality reduction

insert image description here

The effect of the Swiss roll dataset after dimensionality reduction

insert image description here

Guess you like

Origin juejin.im/post/7078085141720989704