导入pandas库后,初始一个DataFrame:
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['Ohio', 'Colorado', 'Utah', 'New York'],
columns=['one', 'two', 'three', 'four'])
输出:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
1.简单的列的查看
df.three
或者:
df['three']
多列的查看:
df[['one','three']]
2.使用loc和iloc选择数据:
loc和iloc允许我们使用轴标签(loc)或整数标签(iloc)以numpy风格的语法从DataFrame中筛选出想要查看的数据。
通过标签筛选出单行多列的数据:
data.loc['Ohio',['two','three']]
输出:
two 1
three 2
Name: Ohio, dtype: int64
通过整数标签iloc选择数据:
data.iloc[2,[3,0,1]]
输出:
four 11
one 8
two 9
Name: Utah, dtype: int64
data.iloc[2]
输出:
one 8
two 9
three 10
four 11
Name: Utah, dtype: int64
索引功能还可以用于切片
data.loc[:'Utah','two']
输出:
Ohio 1
Colorado 5
Utah 9
Name: two, dtype: int64
data.iloc[:,:3]
输出:
one two three
Ohio 0 1 2
Colorado 4 5 6
Utah 8 9 10
New York 12 13 14
data.iloc[:,:3][data.three > 5]
输出:
one two three
Colorado 4 5 6
Utah 8 9 10
New York 12 13 14
DataFrame索引选项如下图:
3.多个条件筛选:
data[(data.one > 4) & (data.four == 11)]
输出:
one two three four
Utah 8 9 10 11
4.特殊条件筛选数据
我们新建一个数据表df,表结构如下:
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
5 Nevada 2003 3.2
6 Oland 2004 3.2
选择state中以’O’开始的数据:
df[df.state.str.startswith('O')]
输出:
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
6 Oland 2004 3.2
选择Ohio和Nevada的pop数据:
df.loc[df.state.isin(['Nevada','Ohio']),['state','pop']]
输出:
state pop
0 Ohio 1.5
1 Ohio 1.7
2 Ohio 3.6
3 Nevada 2.4
4 Nevada 2.9
5 Nevada 3.2
总之:loc是以行列的名字为索引做数据筛选;而iloc则是以行列的整数位置(index)为索引进行数据筛选。