python实现PCA(主成分分析)算法

算法

1、计算数据集每个属性的平均值
2、将原始矩阵的每个值减去其对应的均值
3、求出去均值的矩阵的协方差矩阵
4、得到协方差矩阵的特征值和特征向量
5、根据给定的降维矩阵维度k获取具有最大特征值的特征向量k个,得到特征矩阵
6、降维后的矩阵为:去均值矩阵 * 特征矩阵
7、重构的原始数据为:降维矩阵 * 特征矩阵 + 均值
代码如下:

from numpy import mat, mean, cov, linalg


def pca(data_set, feature_amount):
    data_set_mean = mean(data_set, 0)
    data_set_mean_removed = data_set - data_set_mean
    cov_mat = cov(data_set_mean_removed, rowvar=False)
    eig_values, eig_vectors = linalg.eig(cov_mat)
    eig_pairs = sorted(list(zip(eig_values, eig_vectors.T)), reverse=True)
    feature = mat(list(ele_pair[1] for ele_pair in eig_pairs[:feature_amount]))
    low_data_set = data_set_mean_removed * feature.T
    re_data_set = (low_data_set * feature) + data_set_mean
    return low_data_set, re_data_set


def main():
    data_set = load_data_set('data.txt')
    low_data_set, re_data_set = pca(data_set, 1)


if __name__ == '__main__':
    main()

猜你喜欢

转载自blog.csdn.net/weixin_43793472/article/details/88568986