pandas中的pivot_table和crosstab

(1)pandas.pivot_table
参考文档
pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc=‘mean’, fill_value=None, margins=False, dropna=True, margins_name=‘All’)
栗子：

import  pandas as pd
import  numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar"],
                   "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                          "large"],
                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                   "E": [2, 4, 5, 5, 6, 6, 8, 9, 9],
                  'F':[3,5,5,6,7,8,8,6,4]})
print(pd.pivot_table(df,values=['D','E'],index=['A','B'],aggfunc={'D':np.mean,'E':np.sum}))
print('*****'*3)
pd.crosstab(df['A'], [df['B'],df['C']], rownames=['A'], colnames=['B', 'C'])

结果：

                    D   E
A   B                
bar one  4.500000  14
    two  6.500000  18
foo one  1.666667  11
    two  3.000000  11
***************
Out[30]:
B	one	two
C	large	small	large	small
A				
bar	1	1	1	1
foo	2	1	0	2

(2)pandas.crosstab计算两个（或更多）因子的简单交叉表。默认情况下，除非传递值数组和聚合函数，否则将计算因子的频率表。
参考文档
pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name=‘All’, dropna=True, normalize=False)
栗子：

a = np.array(["foo", "foo", "foo", "foo", "bar", "bar",
              "bar", "bar", "foo", "foo", "foo"], dtype=object)
b = np.array(["one", "one", "one", "two", "one", "one",
               "one", "two", "two", "two", "one"], dtype=object)
c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny",
              "shiny", "dull", "shiny", "shiny", "shiny"],
               dtype=object)
print(a)
print(b)
print(c)
pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

结果为以下：

['foo' 'foo' 'foo' 'foo' 'bar' 'bar' 'bar' 'bar' 'foo' 'foo' 'foo']
['one' 'one' 'one' 'two' 'one' 'one' 'one' 'two' 'two' 'two' 'one']
['dull' 'dull' 'shiny' 'dull' 'dull' 'shiny' 'shiny' 'dull' 'shiny' 'shiny'
 'shiny']
Out[12]:
b	one                	two
c	dull	shiny	dull	shiny
a				
bar	1	2	         1	    0
foo	2	2          1  	 2

注意结果不在是像pivot_table（）生成的值的对应函数的结果。这样应该能够理解了吧。在pd.crosstab中加入normalize 可以将计数转化为百分比的形式，另外还可以加入参数margins计算行或者列的百分比或者计数总和

pandas中的pivot_table和crosstab

猜你喜欢