Experiment Report on Airline Customer Value Analysis

Experiment Report on Airline Customer Value Analysis

hint

Reference book : Zhang Liangjun "Python Data Analysis and Mining Practice", etc.

Data file : Textbook comes with data.

Software used : Pycharm.

Category : Experiment.

Reminder : This experiment is used in conjunction with Zhang Liangjun's book, and the code runs on Pycharm.

1. Purpose of the experiment

1. Based on Python, understand the use of clustering algorithms.

2. Try to draw a vector map (radar map)

2. Experimental environment

1. Operating system: Windows 10.

2. Code running environment: Jupyter notebook or Pycharm.

3. Experimental principle

1. Build the Python development platform.

2. Get started with Python.

3. Python data analysis tools.

4. K-means clustering algorithm.

4. Experimental procedures and experimental results

1. Experimental steps:

insert image description here

2. Experimental results:

(1) Question 1:

【1】Reminder:
insert image description here

【2】Code:

# -*- coding:utf-8 -*-
# 标准差标准化
import pandas as pd
# (1)需要进行标准化的数据文件
data_file = 'zscoredata.xls'
# (2)标准差化后的数据文件
zscore_file = 'new_zscoredata.xls'

# 1、标准化处理
data = pd.read_excel(data_file)
# 2、简洁的语句实现了标准化变换,类似地可以实现任务想要的交换。
data = (data - data.mean(axis=0)) / (data.std(axis=0))
# 3、表头重命名
data.columns = ['Z'+i for i in data.columns]
# 4、数据写入
data.to_excel(zscore_file, index=False)
'''
标准差标准化处理后,形成ZL、ZR、ZF、ZM、ZC 5个属性的数据
'''

【3】Operation results:
insert image description here

(2) Question 2:

【1】Reminder:
insert image description here
insert image description here
insert image description here

【2】Code:

# -*- coding: utf-8 -*-
# K-Means聚类算法

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['font.family'] = 'SimHei'
# 导入K均值聚类算法
from sklearn.cluster import KMeans

# 待聚类的数据文件
input_file = 'preprocesseddata.xls'
# 需要进行的聚类类别数
k = 5

# 1、读取数据
data = pd.read_excel(input_file, usecols=[16, 17, 18, 19, 20])
# print(data)
# 2、调用k-means算法,进行聚类分析
# n_jobs是并行数,一般等于CPU数较好, n_jobs=8
kModel = KMeans(n_clusters=k)
# 3、训练模型
kModel.fit(data)

# 4、查看聚类中心
print("1、查看聚类中心:")
print(kModel.cluster_centers_)
# print(kModel.cluster_centers_[0])
# print(type(kModel.cluster_centers_)) # <class 'numpy.ndarray'>
# print(type(kModel.labels_)) # <class 'numpy.ndarray'>

# 5、查看各样本对应的类别
print("2、查看各样本对应的类别:")
print(kModel.labels_)

# 6、画出客户聚类中心向量图
# 雷达图内部的标签,目的是为了实现雷达图的线条闭合
labels = np.array(['ZL', 'ZR', 'ZF', 'ZM', 'ZC', 'ZL'])


# 数据个数
plot_data = kModel.cluster_centers_
# 指定颜色
color = ['b', 'g', 'r', 'c', 'y']

angles = np.linspace(0, 2*np.pi, k, endpoint=False)
# 闭合
plot_data = np.concatenate((plot_data, plot_data[:, [0]]), axis=1)
# 闭合
angles = np.concatenate((angles, [angles[0]]))

fig = plt.figure()
# 注意:polar参数
ax = fig.add_subplot(111, polar=True)
for i in range(len(plot_data)):
    # 画线,label为雷达图旁边的标签
    ax.plot(angles, plot_data[i], 'o-', color=color[i], label=u'客户群'+str(i), linewidth=2)

ax.set_rgrids(np.arange(0.01, 3.5, 0.5), np.arange(-1, 2.5, 0.5))
ax.set_thetagrids(angles * 180/np.pi, labels)
plt.legend(loc=4)
# 保存成.jpg文件
plt.savefig('客户聚类中心图.jpg')
plt.show()

【3】Operation results:
insert image description here

insert image description here

5. Experimental summary

[1] Through this experiment, I understand and use the K-means clustering algorithm. In this experiment, I encountered:
(1) ValueError: The number of FixedLocator locations, usually from a call to set_ticks, does not match the number of ticklabels This error, after a long search, found data = np.concatenate(( data, [data[0]])) angles = np.concatenate((angles, [angles[0]])) is caused by these two lines of code, because the lines need to be closed, so they are spliced ​​later, some people on the Internet The solution is to comment it out, but there is a problem that the radar chart is not closed. The correct method is to repeat the first element in radar_labels.
(2) Another error is: AttributeError: 'Text' object has no property 'frac'. After some time of groping, I found that this error is that the 'Text' object has removed the 'frac' attribute in the new version of matplotlib. Solution: remove the variable.
[2] In short, data analysis is not easy to specialize in, it is a very comprehensive thing, and it needs to be constantly explored and pondered by oneself!

Guess you like

Origin blog.csdn.net/xu_yushu/article/details/124554506