stack,unstack,groupby,pivot_table的区别

stack() 堆积，是花括号的形式，只有列上的索引，unstack() 不要堆积，是表格的形式，行列均有索引 groupby() pivot_table 使用透视表实现groupby的功能

In [100]:

         
     
 
           data=pd.DataFrame(np.arange(6).reshape((2,3)),index=pd.Index(['A','B']),columns=pd.Index(['one','two','three']))count()函数 列表推导式 
           #统计列表每个元素中指定单词出现的个数 
           words=['apple','pare','banana','and','peach','Anda'] 
           for word in words: 
               print(word.lower().count('a')) #lower()识别大小写 
           1 
           1 
           3 
           1 
           1 
           2 
           [word for word in words if word.lower().count('a')>=2] 
           ['banana', 'Anda'] 
           strings=['a','bv','tit','apple','ctr'] 
           [x.title() for x in strings if len(x)>2] 
           ['Tit', 'Apple', 'Ctr'] 
           list(map(len,strings)) 
           [1, 2, 3, 5, 3] 
           transpose() 转置 
           import numpy as np 
           three=np.arange(18).reshape(2,3,3) 
           three 
           array([[[ 0,  1,  2], 
                   [ 3,  4,  5], 
                   [ 6,  7,  8]], 
           ​ 
                  [[ 9, 10, 11], 
                   [12, 13, 14], 
                   [15, 16, 17]]]) 
           three.transpose(2,1,0) 
           array([[[ 0,  9], 
                   [ 3, 12], 
                   [ 6, 15]], 
           ​ 
                  [[ 1, 10], 
                   [ 4, 13], 
                   [ 7, 16]], 
           ​ 
                  [[ 2, 11], 
                   [ 5, 14], 
                   [ 8, 17]]]) 
           ​ 
          

          
      

In [101]:

data

Out[101]:

	one	two	three
A	0	1	2
B	3	4	5

In [106]:

         
           dd=data.stack() 
           dd

Out[106]:

A  one      0
   two      1
   three    2
B  one      3
   two      4
   three    5
dtype: int64

In [107]:

dd.unstack()

Out[107]:

	one	two	three
A	0	1	2
B	3	4	5

In [112]:

         
      dd. 
      unstack(level=0) #取最外层索引

Out[112]:

	A	B
one	0	3
two	1	4
three	2	5

In [113]:

         
           dd.unstack(level=-1) #取内层索引

Out[113]:

	one	two	three
A	0	1	2
B	3	4	5

In [212]:

         
         df = pd.DataFrame({'key1':['a','a','b','b','a'],'key2':['one','two','one','two','one'],'data1':np.random.randn(5),'data2':np.random.randn(5)})

In [115]:

df

Out[115]:

	key1	key2	data1	data2
0	a	one	-0.343458	-0.173529
1	a	two	-0.753353	0.068864
2	b	one	-0.554884	-0.147296
3	b	two	-0.064841	1.483495
4	a	one	0.237470	-0.107894

In [120]:

 
           grouped1=df.groupby('key1')

In [121]:

         
           grouped2=df.groupby(['key1','key2'])

In [122]:

 
           [x for x in grouped1]

Out[122]:

[('a',   key1 key2     data1     data2
  0    a  one -0.343458 -0.173529
  1    a  two -0.753353  0.068864
  4    a  one  0.237470 -0.107894), ('b',   key1 key2     data1     data2
  2    b  one -0.554884 -0.147296
  3    b  two -0.064841  1.483495)]

In [123]:

 
           [x for x in grouped2]

Out[123]:

[(('a', 'one'),   key1 key2     data1     data2
  0    a  one -0.343458 -0.173529
  4    a  one  0.237470 -0.107894),
 (('a', 'two'),   key1 key2     data1     data2
  1    a  two -0.753353  0.068864),
 (('b', 'one'),   key1 key2     data1     data2
  2    b  one -0.554884 -0.147296),
 (('b', 'two'),   key1 key2     data1     data2
  3    b  two -0.064841  1.483495)]

In [124]:

         
           #pandas.pivot_table(data,values=None,index=None,columns=None,aggfunc='mean',fill_value=None,margins=False,\ 
                              #dropna=True,margins_name='All')[source] 
           #pivot_table 的默认函数是mean，即求平均值。

In [126]:

         
           pd.pivot_table(df,index='key2',columns='key1')

Out[126]:

	data1		data2
key1	a	b	a	b
key2
one	-0.052994	-0.554884	-0.140712	-0.147296
two	-0.753353	-0.064841	0.068864	1.483495

In [127]:

         
           pd.pivot_table(df,index=['key1','key2'])

Out[127]:

		data1	data2
key1	key2
a	one	-0.052994	-0.140712
a	two	-0.753353	0.068864
b	one	-0.554884	-0.147296
b	two	-0.064841	1.483495

In [130]:

         
      df. 
      pivot_table('data1',columns='key2')

Out[130]:

key2	one	two
data1	-0.220291	-0.409097

stack,unstack,groupby,pivot_table的区别

stack,unstack,groupby,pivot_table的区别

猜你喜欢