-
导入支持包
import pandas as pd import numpy as np
-
生成测试数据
dates = pd.date_range('20200220', periods=6) df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates, columns=['A','B','C','D']) df.iloc[0,1] = np.nan df.iloc[1,2] = np.nan ''' A B C D 2020-02-20 0 NaN 2.0 3 2020-02-21 4 5.0 NaN 7 2020-02-22 8 9.0 10.0 11 2020-02-23 12 13.0 14.0 15 2020-02-24 16 17.0 18.0 19 2020-02-25 20 21.0 22.0 23 '''
-
删除缺失数据
df2 = df.dropna( axis=0, # 0: 对行进行操作; 1: 对列进行操作 how='any' # 'any': 只要存在 NaN 就 drop 掉; 'all': 必须全部是 NaN 才 drop ) # A B C D # 2020-02-22 8 9.0 10.0 11 # 2020-02-23 12 13.0 14.0 15 # 2020-02-24 16 17.0 18.0 19 # 2020-02-25 20 21.0 22.0 23
-
在缺失数据处补充值
df3 = df.fillna(value=0) # A B C D # 2020-02-20 0 0.0 2.0 3 # 2020-02-21 4 5.0 0.0 7 # 2020-02-22 8 9.0 10.0 11 # 2020-02-23 12 13.0 14.0 15 # 2020-02-24 16 17.0 18.0 19 # 2020-02-25 20 21.0 22.0 23
顺便说一句,在机器学习中通常用来补充的值设定有平均数、中位数、众数等
-
判断df对象的各位是否为空
df4 = df.isnull() # 为空的部分返回True,非空的部位返回False # A B C D # 2020-02-20 False True False False # 2020-02-21 False False True False # 2020-02-22 False False False False # 2020-02-23 False False False False # 2020-02-24 False False False False # 2020-02-25 False False False False
-
判断df对象中是否存在空值
# np.any(object), object中有True为True,全False为False nullres = np.any(df.isnull()) # True
-
参考文献
程序主要来自 Pandas 处理丢失数据
Pandas空数据的处理
猜你喜欢
转载自blog.csdn.net/BBJG_001/article/details/104490780
今日推荐
周排行