(1)pandas.pivot_table
参考文档
pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc=‘mean’, fill_value=None, margins=False, dropna=True, margins_name=‘All’)
栗子:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9],
'F':[3,5,5,6,7,8,8,6,4]})
print(pd.pivot_table(df,values=['D','E'],index=['A','B'],aggfunc={'D':np.mean,'E':np.sum}))
print('*****'*3)
pd.crosstab(df['A'], [df['B'],df['C']], rownames=['A'], colnames=['B', 'C'])
结果:
D E
A B
bar one 4.500000 14
two 6.500000 18
foo one 1.666667 11
two 3.000000 11
***************
Out[30]:
B one two
C large small large small
A
bar 1 1 1 1
foo 2 1 0 2
(2)pandas.crosstab计算两个(或更多)因子的简单交叉表。默认情况下,除非传递值数组和聚合函数,否则将计算因子的频率表。
参考文档
pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name=‘All’, dropna=True, normalize=False)
栗子:
a = np.array(["foo", "foo", "foo", "foo", "bar", "bar",
"bar", "bar", "foo", "foo", "foo"], dtype=object)
b = np.array(["one", "one", "one", "two", "one", "one",
"one", "two", "two", "two", "one"], dtype=object)
c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny",
"shiny", "dull", "shiny", "shiny", "shiny"],
dtype=object)
print(a)
print(b)
print(c)
pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
结果为以下:
['foo' 'foo' 'foo' 'foo' 'bar' 'bar' 'bar' 'bar' 'foo' 'foo' 'foo']
['one' 'one' 'one' 'two' 'one' 'one' 'one' 'two' 'two' 'two' 'one']
['dull' 'dull' 'shiny' 'dull' 'dull' 'shiny' 'shiny' 'dull' 'shiny' 'shiny'
'shiny']
Out[12]:
b one two
c dull shiny dull shiny
a
bar 1 2 1 0
foo 2 2 1 2
注意结果不在是像pivot_table()生成的值的对应函数的结果。这样应该能够理解了吧。在pd.crosstab中加入normalize 可以将计数转化为百分比的形式,另外还可以加入参数margins计算行或者列的百分比或者计数总和