Pandas:类别变量向量化--get_dummies

import numpy as np
import pandas as pd
from pandas import Series,DataFrame

一、向量化

df = DataFrame({'key':['b','b','a','c','a','b'],
               'data1':range(6)})
print(df)
   data1 key
0      0   b
1      1   b
2      2   a
3      3   c
4      4   a
5      5   b
print(pd.get_dummies(df['key']))
   a  b  c
0  0  1  0
1  0  1  0
2  1  0  0
3  0  0  1
4  1  0  0
5  0  1  0

二、与原始数据合并

dummies = pd.get_dummies(df['key'],prefix = 'key')
df_with_dummy = df[['data1']].join(dummies)
print(df_with_dummy)
   data1  key_a  key_b  key_c
0      0      0      1      0
1      1      0      1      0
2      2      1      0      0
3      3      0      0      1
4      4      1      0      0
5      5      0      1      0

猜你喜欢

转载自blog.csdn.net/bqw18744018044/article/details/79964741