Pandas基础(三):数据的筛选

导入pandas库后,初始一个DataFrame:

data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                       index=['Ohio', 'Colorado', 'Utah', 'New York'],
                       columns=['one', 'two', 'three', 'four'])

输出:

			one	two	three	four
Ohio		0	1	2	3
Colorado	4	5	6	7
Utah		8	9	10	11
New York	12	13	14	15

1.简单的列的查看

df.three

或者:

df['three']

多列的查看:

df[['one','three']]

2.使用loc和iloc选择数据:
loc和iloc允许我们使用轴标签(loc)或整数标签(iloc)以numpy风格的语法从DataFrame中筛选出想要查看的数据。

通过标签筛选出单行多列的数据:

data.loc['Ohio',['two','three']]

输出:

two      1
three    2
Name: Ohio, dtype: int64

通过整数标签iloc选择数据:

data.iloc[2,[3,0,1]]

输出:

four    11
one      8
two      9
Name: Utah, dtype: int64
data.iloc[2]

输出:

one       8
two       9
three    10
four     11
Name: Utah, dtype: int64

索引功能还可以用于切片

data.loc[:'Utah','two']

输出:

Ohio        1
Colorado    5
Utah        9
Name: two, dtype: int64
data.iloc[:,:3]

输出:

			one	two	three
Ohio		0	1	2
Colorado	4	5	6
Utah		8	9	10
New York	12	13	14

data.iloc[:,:3][data.three > 5]

输出:

			one	two	three
Colorado	4	5	6
Utah		8	9	10
New York	12	13	14

DataFrame索引选项如下图:
在这里插入图片描述

3.多个条件筛选

data[(data.one > 4) & (data.four == 11)]

输出:

	one	two	three	four
Utah	8	9	10	11

4.特殊条件筛选数据
我们新建一个数据表df,表结构如下:

	state	year	pop
0	Ohio	2000	1.5
1	Ohio	2001	1.7
2	Ohio	2002	3.6
3	Nevada	2001	2.4
4	Nevada	2002	2.9
5	Nevada	2003	3.2
6	Oland	2004	3.2

选择state中以’O’开始的数据:

df[df.state.str.startswith('O')]

输出:

	state	year	pop
0	Ohio	2000	1.5
1	Ohio	2001	1.7
2	Ohio	2002	3.6
6	Oland	2004	3.2

选择Ohio和Nevada的pop数据:

df.loc[df.state.isin(['Nevada','Ohio']),['state','pop']]

输出:

	state	pop
0	Ohio	1.5
1	Ohio	1.7
2	Ohio	3.6
3	Nevada	2.4
4	Nevada	2.9
5	Nevada	3.2

总之:loc是以行列的名字为索引做数据筛选;而iloc则是以行列的整数位置(index)为索引进行数据筛选。

猜你喜欢

转载自blog.csdn.net/opp003/article/details/83177748
今日推荐