数据预处理-定类数据处理 定类离散特征转化为One-Hot-Encoding独热编码

用 数字标号的定类数据 转化成 类别

如sex 0 1 其中 1是男性 0是女性

可以转化成 male 和female

 本文使用的是uci 心脏病数据集

将整数编码转为实际对应的字符串

df['sex'][df['sex']==0]='female'
df['sex'][df['sex']==1]='male'

df['chest_pain_type'][df['chest_pain_type']==0]='typical angina'
df['chest_pain_type'][df['chest_pain_type']==1]='antypical angina'
df['chest_pain_type'][df['chest_pain_type']==2]='non-anginal pain'
df['chest_pain_type'][df['chest_pain_type']==3]='asymptomatic'

df['fasting_blood_sugar'][df['fasting_blood_sugar']==0]='lower than 120mg/ml'
df['fasting_blood_sugar'][df['fasting_blood_sugar']==1]='greater than 120mg/ml'

df['resting_electrocardiographic'][df['resting_electrocardiographic']==0]='normal'
df['resting_electrocardiographic'][df['resting_electrocardiographic']==1]='ST-T wave abnarmality'
df['resting_electrocardiographic'][df['resting_electrocardiographic']==2]='left wentricular hapertorphy'

df['exercise_induced_angina'][df['exercise_induced_angina']==0]='no'
df['exercise_induced_angina'][df['exercise_induced_angina']==1]='yes'

df['ST_slope'][df['ST_slope']==0]='upsloping'
df['ST_slope'][df['ST_slope']==1]='flat'
df['ST_slope'][df['ST_slope']==2]='downsloping'

df['thal'][df['thal']==0]='unknown'
df['thal'][df['thal']==1]='normal'
df['thal'][df['thal']==2]='fixed defect'
df['thal'][df['thal']==3]='reversable defect'

效果

在pandas中,

离散的定类和定序特征列应该是object类型

连续的定距和定比特征列应该是int64或者float64的浮点数类型

将定类和定序的特征类转化为One-Hot独热编码

df=pd.get_dummies(df)
df.columns

猜你喜欢

转载自blog.csdn.net/FishBean/article/details/122219005