Pandas进阶操作

记录一些日常用到的偏技巧性的pandas操作

返回各列非空值的个数,默认降序排序

  • loans_2007['loan_status'].value_counts()

替换为1和0

  • status_replace = { "loan_status" : { "Fully Paid": 1, "Charged Off": 0, } }
  • loans_2007 = loans_2007.replace(status_replace)  

选出特定类型的列

  • object_columns_df = loans.select_dtypes(include=["object"])
  • print(object_columns_df.iloc[0])  

按照某几列排序

  • data.sort_values(['Fare','Age'],ascending=False)

按照某列分组,groupby的使用

  • data.groupby(by='Sex').count()

透视表功能,不设置aggfunc方法的情况下,用的是均值

  • data.pivot_table(index='Pclass',values='Age',aggfunc=[len,mean,median]) 
  • 透视表创建完成后,一般需要重置索引列
  • data_reindexed = new_data.reset_index(drop = True)

创建虚拟变量

  • dummy_df = pd.get_dummies(loans[cat_columns])
  • loans = pd.concat([loans, dummy_df], axis=1)
  • loans = loans.drop(cat_columns, axis=1)

过滤只有一个值的列

orig_columns = loans_2007.columns
drop_columns = []
for col in orig_columns:
    col_series = loans_2007[col].dropna().unique()
    if len(col_series) == 1:
        drop_columns.append(col)
loans_2007 = loans_2007.drop(drop_columns, axis=1)

猜你喜欢

转载自blog.csdn.net/qq_38923076/article/details/82942525