This article has participated in the "New Talent Creation Ceremony" event
Principle analysis
In the previous blog about dimensionality reduction, we have discussed the principles and formula derivation of covariance and singular value decomposition method. Mathematical concepts and practices of dimensionality reduction) and easy-to-understand machine learning - the derivation and explanation of the mathematical principles of gradient ascent principal component analysis .
Data selection
For data selection, we use the iris data set and the Swiss roll data set to implement the code and display the effect
Why use these two datasets? Because iris is one of the most commonly used data sets for machine learning beginners, it is more familiar to most people, and his data is four-dimensional data, which meets the requirements for dimensionality reduction. However, since we cannot intuitively observe the distribution effect of four-dimensional data by drawing, we introduce the Swiss roll data set here.
iris data observation
Since the iris data cannot be intuitively observed by drawing, why is there still drawing observation? First of all, from a habit, we should first observe the data set, and then we can select a part of the data to roughly observe the distribution of the data
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus']=False # 用来正常显示负号
iris = datasets.load_iris() #拿到莺尾花数据
X = iris.data[:,2:] #莺尾花的萼片长宽,花瓣长宽,这里只提取花瓣长宽数据
y = iris.target #莺尾花的类别
plt.scatter(X[y==0,0],X[y==0,1])
plt.scatter(X[y==1,0],X[y==1,1])
plt.scatter(X[y==2,0],X[y==2,1])
plt.xlabel('花瓣长')
plt.ylabel('花瓣宽')
plt.grid()
复制代码
Swiss Roll Data Observation
from sklearn.datasets import make_swiss_roll
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
X, t = make_swiss_roll(n_samples=2000, noise=0.1)
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=t, cmap=plt.cm.Spectral,edgecolors='black')
plt.show()
复制代码
From the image, we can observe that the dimensionality reduction of the Swiss roll cannot be done by just removing one coordinate axis, but should remove another coordinate axis after the rotation.
Code
singular value decomposition
x1 = X.dot(X.T)
eig, featueVector = np.linalg.eig(x1)
x1 = x1.dot(featueVector[:,1:])
plt.scatter(x1[:,0], x1[:,1], c=t)
plt.show()
复制代码
The effect of the iris dataset after dimensionality reduction
The effect of dimensionality reduction on the Swiss roll dataset
Covariance method
import matplotlib.pyplot as plt
import numpy as np
X1cov=np.cov(X.T)
eig, featueVector = np.linalg.eig(X1cov)
X1new = X.dot(featueVector[:,:2])
xm = X1new.max()
plt.scatter(X1new[:,0], X1new[:,1], c=t)
plt.show()
复制代码
The effect of the iris dataset after dimensionality reduction
The effect of the Swiss roll dataset after dimensionality reduction
PCA method during sklearn
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(X)
X1new = pca.transform(X)
plt.scatter(X1new[:,0], X1new[:,1], c=t)
plt.show()
复制代码