pandas中独热编码的使用(理论)

本节代码主要来自张江老师,对此表示感谢

最常用的就是pandas.get_dummies()函数了

这是官方API

pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None) → ‘DataFrame’[source]

Convert categorical variable into dummy/indicator variables.

Parameters

    dataarray-like, Series, or DataFrame

        Data of which to get dummy indicators.
    prefixstr, list of str, or dict of str, default None

        String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Alternatively, prefix can be a dictionary mapping column names to prefixes.
    prefix_sepstr, default ‘_’

        If appending prefix, separator/delimiter to use. Or pass a list or dictionary as with prefix.
    dummy_nabool, default False

        Add a column to indicate NaNs, if False NaNs are ignored.
    columnslist-like, default None

        Column names in the DataFrame to be encoded. If columns is None then all the columns with object or category dtype will be converted.
    sparsebool, default False

        Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).
    drop_firstbool, default False

        Whether to get k-1 dummies out of k categorical levels by removing the first level.
    dtypedtype, default np.uint8

        Data type for new columns. Only a single dtype is allowed.

        New in version 0.23.0.

上面一长串我估计你也不想看,比较重要的有这几个参数

  1. data 你要转换的数据
  2. prefix你要转换的列标识
  3. drop_first 是否多转换一列,默认为false
  4. return 一个dataframe
发布了59 篇原创文章 · 获赞 19 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/weixin_43914889/article/details/104473950
今日推荐