stack,unstack,groupby,pivot_table的区别

stack,unstack,groupby,pivot_table的区别

stack() 堆积,是花括号的形式,只有列上的索引,unstack() 不要堆积,是表格的形式,行列均有索引 groupby() pivot_table 使用透视表实现groupby的功能

In [100]:
 
data=pd.DataFrame(np.arange(6).reshape((2,3)),index=pd.Index(['A','B']),columns=pd.Index(['one','two','three']))count()函数 列表推导式
#统计列表每个元素中指定单词出现的个数
words=['apple','pare','banana','and','peach','Anda']
for word in words:
    print(word.lower().count('a')) #lower()识别大小写
1
1
3
1
1
2
[word for word in words if word.lower().count('a')>=2]
['banana', 'Anda']
strings=['a','bv','tit','apple','ctr']
[x.title() for x in strings if len(x)>2]
['Tit', 'Apple', 'Ctr']
list(map(len,strings))
[1, 2, 3, 5, 3]
transpose() 转置
import numpy as np
three=np.arange(18).reshape(2,3,3)
three
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],
       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]]])
three.transpose(2,1,0)
array([[[ 0,  9],
        [ 3, 12],
        [ 6, 15]],
       [[ 1, 10],
        [ 4, 13],
        [ 7, 16]],
       [[ 2, 11],
        [ 5, 14],
        [ 8, 17]]])
 
 
In [101]:
data
 
 
Out[101]:
  one two three
A 0 1 2
B 3 4 5
In [106]:
 
dd=data.stack()
dd
 
 
Out[106]:
A  one      0
   two      1
   three    2
B  one      3
   two      4
   three    5
dtype: int64
In [107]:
dd.unstack()
 
 
Out[107]:
  one two three
A 0 1 2
B 3 4 5
In [112]:
  dd. unstack(level=0) #取最外层索引
 
 
Out[112]:
  A B
one 0 3
two 1 4
three 2 5
In [113]:
 
dd.unstack(level=-1) #取内层索引
 
 
Out[113]:
  one two three
A 0 1 2
B 3 4 5
In [212]:
 
df = pd.DataFrame({'key1':['a','a','b','b','a'],'key2':['one','two','one','two','one'],'data1':np.random.randn(5),'data2':np.random.randn(5)})
 
In [115]:
df
 
 
Out[115]:
  key1 key2 data1 data2
0 a one -0.343458 -0.173529
1 a two -0.753353 0.068864
2 b one -0.554884 -0.147296
3 b two -0.064841 1.483495
4 a one 0.237470 -0.107894
In [120]:
grouped1=df.groupby('key1')
 
 
In [121]:
 
grouped2=df.groupby(['key1','key2'])
 
 
In [122]:
[x for x in grouped1]
 
 
Out[122]:
[('a',   key1 key2     data1     data2
  0    a  one -0.343458 -0.173529
  1    a  two -0.753353  0.068864
  4    a  one  0.237470 -0.107894), ('b',   key1 key2     data1     data2
  2    b  one -0.554884 -0.147296
  3    b  two -0.064841  1.483495)]
In [123]:
[x for x in grouped2]
 
 
Out[123]:
[(('a', 'one'),   key1 key2     data1     data2
  0    a  one -0.343458 -0.173529
  4    a  one  0.237470 -0.107894),
 (('a', 'two'),   key1 key2     data1     data2
  1    a  two -0.753353  0.068864),
 (('b', 'one'),   key1 key2     data1     data2
  2    b  one -0.554884 -0.147296),
 (('b', 'two'),   key1 key2     data1     data2
  3    b  two -0.064841  1.483495)]
In [124]:
 
#pandas.pivot_table(data,values=None,index=None,columns=None,aggfunc='mean',fill_value=None,margins=False,\
                   #dropna=True,margins_name='All')[source]
#pivot_table 的默认函数是mean,即求平均值。
 
 
In [126]:
 
pd.pivot_table(df,index='key2',columns='key1')
 
 
Out[126]:
  data1 data2
key1 a b a b
key2        
one -0.052994 -0.554884 -0.140712 -0.147296
two -0.753353 -0.064841 0.068864 1.483495
In [127]:
 
pd.pivot_table(df,index=['key1','key2'])
 
 
Out[127]:
    data1 data2
key1 key2    
a one -0.052994 -0.140712
two -0.753353 0.068864
b one -0.554884 -0.147296
two -0.064841 1.483495
In [130]:
  df. pivot_table('data1',columns='key2')
 
 
Out[130]:
key2 one two
data1 -0.220291 -0.409097

猜你喜欢

转载自www.cnblogs.com/liyun1/p/11261872.html