数据挖掘比赛pandas常用函数

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/winycg/article/details/82803106

去处重复数据:

>>> a = pd.DataFrame({'a':[1,1,2], 'b':[1,1,3]})
>>> a
   a  b
0  1  1
1  1  1
2  2  3
>>> a.drop_duplicates()
   a  b
0  1  1
2  2  3

汇总统计:

>>> a.describe()
              a         b
count  2.000000  2.000000
mean   1.500000  2.000000
std    0.707107  1.414214
min    1.000000  1.000000
25%    1.250000  1.500000
50%    1.500000  2.000000
75%    1.750000  2.500000
max    2.000000  3.000000

猜你喜欢

转载自blog.csdn.net/winycg/article/details/82803106