pandas将非数值型特征转化为数值型(one-hot编码)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

name = np.array([['jack', 'ross', 'john', 'blues', 'frank', 'bitch', 'haha', 'asd', 'loubin']])
age = np.array([[12, 32, 23, 4,32,45,65,23,65]])
married = np.array([[1, 0, 1, 1, 0, 1, 0, 0, 0]])
gender = np.array([[0, 0, 0, 0, 1, 1, 1, 1, 1]])


matrix = np.concatenate((name, age, married, gender), axis=0)
matrix = matrix.T


data = pd.DataFrame(data=matrix, columns=['name', 'age', 'married', 'gender'])
print(data)

print(pd.get_dummies(data=data['name'], prefix='name'))

运行结果如下,新的表的列名是以被编码的列的值进行命名,可以定义前缀

C:\software\Anaconda\envs\ml\python.exe C:/学习/python/科比生涯数据分析/venv/groupy.py
     name age married gender
0    jack  12       1      0
1    ross  32       0      0
2    john  23       1      0
3   blues   4       1      0
4   frank  32       0      1
5   bitch  45       1      1
6    haha  65       0      1
7     asd  23       0      1
8  loubin  65       0      1
   name_asd  name_bitch  name_blues  ...  name_john  name_loubin  name_ross
0         0           0           0  ...          0            0          0
1         0           0           0  ...          0            0          1
2         0           0           0  ...          1            0          0
3         0           0           1  ...          0            0          0
4         0           0           0  ...          0            0          0
5         0           1           0  ...          0            0          0
6         0           0           0  ...          0            0          0
7         1           0           0  ...          0            0          0
8         0           0           0  ...          0            1          0

[9 rows x 9 columns]

Process finished with exit code 0

猜你喜欢

转载自www.cnblogs.com/loubin/p/11919777.html