pandas学习 - get-dummies,drop,join函数

get-dummies

将分类变量转换为哑变量/指示变量

pd.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False)

data为df或series,在未指定对data的某个列做one-hot时,get_dummies会自动识别data内类型为str的列,并做on-hot。prefix是为one-hot转变后的列命名,dunmmy_na是指是否将na类型也作为one-hot编码时的一个类型。

# 未指定哪一列做one-hot

df = pd.DataFrame({'A': ['A','B','C'], 'B': [1,2,3],
                        'C': [1, 2, 3]})
pd.get_dummies(df,prefix=['A'])

在这里插入图片描述
当data内的列均不为str且也未指定对data的哪一列做编码时,get_dummies会不作任何处理。

df = pd.DataFrame({'A': [1,2,2], 'B': [1,2,3],
                        'C': [1, 2, 3]})
pd.get_dummies(df,prefix=['A'])

在这里插入图片描述
这时若想对data的第一列做one-hot就得指定该列,然后将其余数据列合并

df = pd.DataFrame({'A': [1,2,2], 'B': [1,2,3],
                        'C': [1, 2, 3]})
a_hot = pd.get_dummies(df['A'])
a_hot.join(df[['B','C']])

在这里插入图片描述

df.drop

drop(labels, axis=0, level=None, inplace=False, errors=‘raise’)

axis=0 表示删除行

df.join

join(other, on=None, how=‘left’, lsuffix=’’, rsuffix=’’, sort=False)
必须是dataframe格式才可使用

Parameters usage
other DataFrame, Series with name field set, or list of DataFrame Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame
on column name, tuple/list of column names, or array-like
how {‘left’, ‘right’, ‘outer’, ‘inner’} How to handle indexes of the two objects. Default: ‘left’ for joining on index, None otherwise * left: use calling frame’s index * right: use input frame’s index * outer: form union of indexes * inner: use intersection of indexes
lsuffix string Suffix to use from left frame’s overlapping columns
rsuffix string Suffix to use from right frame’s overlapping columns
sort boolean, default False Order result DataFrame lexicographically by the join key. If False, preserves the index order of the calling (left) DataFrame

猜你喜欢

转载自blog.csdn.net/qq_39446239/article/details/89350060