目录
前言
前面介绍了pandas数据结构,见往期文章:
一、重建索引
1. reindex
reindex用于创建一个符合新索引的新对象。
示例:
import pandas as pd
data = pd.Series([4.4, 5.2, 1.5, 6.7, 3.1], index=['e', 'a', 'c', 'b', 'd'])
print(data)
#重建索引
data_2 = data.reindex(['a', 'b', 'c', 'd', 'e'])
print(data_2)
结果:
e 4.4
a 5.2
c 1.5
b 6.7
d 3.1
dtype: float64
a 5.2
b 6.7
c 1.5
d 3.1
e 4.4
dtype: float64
2. reindex + method
对于顺序数据,比如时间序列,在索引时需要插值。method可选参数允许我们使用ffill等方法在重建索引时插值,ffill方法会将值前向填充
示例:
import pandas as pd
data = pd.Series([4.4, 5.2, 1.5, 6.7, 3.1], index=[2001, 2003, 2005, 2006, 2008])
print(data)
#重建索引
years = range(2001, 2009)
#前向填充
data_2 = data.reindex(years, method = 'ffill' )
print(data_2)
#后向填充
data_3 = data.reindex(years, method = 'bfill' )
print(data_3)
结果:
2001 4.4
2003 5.2
2005 1.5
2006 6.7
2008 3.1
dtype: float64
2001 4.4
2002 4.4
2003 5.2
2004 5.2
2005 1.5
2006 6.7
2007 6.7
2008 3.1
dtype: float64
2001 4.4
2002 5.2
2003 5.2
2004 1.5
2005 1.5
2006 6.7
2007 3.1
2008 3.1
dtype: float64
3. 改变行索引和列索引
示例:
import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(15).reshape((5, 3)), index=[2001, 2005, 2003, 2006, 2004], columns=['MZY', 'MGA', 'JYL'])
print(data)
#重建行索引
data1 = data.reindex([2001, 2003, 2004, 2005, 2006])
print(data1)
#重建列索引
data2 = data.reindex(columns=['MGA', 'JYL', 'MZY'])
print(data2)
#同时重建行和列
data3 = data.reindex([2006, 2005, 2004, 2003, 2001], columns = ['JYL', 'MGA', 'MZY'])
print(data3)
结果:
MZY MGA JYL
2001 0 1 2
2005 3 4 5
2003 6 7 8
2006 9 10 11
2004 12 13 14
MZY MGA JYL
2001 0 1 2
2003 6 7 8
2004 12 13 14
2005 3 4 5
2006 9 10 11
MGA JYL MZY
2001 1 2 0
2005 4 5 3
2003 7 8 6
2006 10 11 9
2004 13 14 12
JYL MGA MZY
2006 11 10 9
2005 5 4 3
2004 14 13 12
2003 8 7 6
2001 2 1 0
更多用户倾向于使用loc函数:
示例:
import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(15).reshape((5, 3)), index=[2001, 2005, 2003, 2006, 2004], columns=['MZY', 'MGA', 'JYL'])
print(data)
#重置索引
data3 = data.loc[[2006, 2005, 2004, 2003, 2001], ['JYL', 'MGA', 'MZY']]
print(data3)
结果:
MZY MGA JYL
2001 0 1 2
2005 3 4 5
2003 6 7 8
2006 9 10 11
2004 12 13 14
JYL MGA MZY
2006 11 10 9
2005 5 4 3
2004 14 13 12
2003 8 7 6
2001 2 1 0
注意:
loc的用法是加方括号,不是圆括号
谨记!!!
4. reindex方法的参数
具体方法的参数见表格:
二、轴向上删除条目
drop方法会返回一个含有指示值或轴向上删除值的新对象
1. Series上的删除
示例:
import pandas as pd
import numpy as np
obj = pd.Series(np.arange(5), index = ['a', 'b', 'c', 'd', 'e'])
new_obj1 = obj.drop('c')
print(new_obj1)
new_obj2 = obj.drop(['c', 'd'])
print(new_obj2)
结果:
a 0
b 1
d 3
e 4
dtype: int32
a 0
b 1
e 4
dtype: int32
可以看出,drop不会改变原有数据。
2. DataFrame上的删除
代码:
import pandas as pd
import numpy as np
obj = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['MZY', 'MGA', 'JYL', 'FSQ'],
columns=['one', 'two', 'three', 'four'])
print(obj)
#删除行
obj1 = obj.drop(['MZY', 'MGA'])
print(obj1)
#删除列
obj2 = obj.drop(['two', 'four'], axis=1)
print(obj2)
obj3 = obj.drop('two', axis='columns')
print(obj3)
'''axis=1与axis=“columns”是等价的'''
结果:
one two three four
MZY 0 1 2 3
MGA 4 5 6 7
JYL 8 9 10 11
FSQ 12 13 14 15
one two three four
JYL 8 9 10 11
FSQ 12 13 14 15
one three
MZY 0 2
MGA 4 6
JYL 8 10
FSQ 12 14
one three four
MZY 0 2 3
MGA 4 6 7
JYL 8 10 11
FSQ 12 14 15
进程已结束,退出代码0
三、索引、选择与过滤
1. Series的索引
series的索引与numpy相似
示例:
import pandas as pd
import numpy as np
obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
print(obj)
#索引某一行,两种方法都可以
print(obj['b'])
print(obj[1])
#做切片,两种方法
print(obj[2:4])
print(obj[['b', 'c', 'd']])
普通的python切片是左闭右开区间,Serise切片是闭区间。
示例:
import pandas as pd
import numpy as np
obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
print(obj)
#切片,注意下面两周表示方法的不同之处
print(obj['b': 'd']) #闭区间
print(obj[1:3]) #左闭右开区间
结果:
a 0.0
b 1.0
c 2.0
d 3.0
dtype: float64
b 1.0
c 2.0
d 3.0
dtype: float64
b 1.0
c 2.0
dtype: float64
修改Series:
示例:
import pandas as pd
import numpy as np
obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
print(obj)
#修改1
obj['b':'d'] = 100
print(obj)
#修改2
obj[['b', 'c']] = 50
print(obj)
#修改3
obj[1] = 0
print(obj)
obj[1:3] = 4
print(obj)
#布尔值
print(obj[1:3] == 0)
结果:
a 0.0
b 1.0
c 2.0
d 3.0
dtype: float64
a 0.0
b 100.0
c 100.0
d 100.0
dtype: float64
a 0.0
b 50.0
c 50.0
d 100.0
dtype: float64
a 0.0
b 0.0
c 50.0
d 100.0
dtype: float64
a 0.0
b 4.0
c 4.0
d 100.0
dtype: float64
b False
c False
dtype: bool
2. DataFrame索引
1. 列的索引
import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['MZY', 'JLY', 'MGA', 'FSQ'],
columns=['one', 'two', 'three', 'four'])
print(data)
#列的索引
print(data['two'])
print(data[['one', 'two']])
print(data < 5)
data[data < 5] = 0
print(data)
结果:
one two three four
MZY 0 1 2 3
JLY 4 5 6 7
MGA 8 9 10 11
FSQ 12 13 14 15
MZY 1
JLY 5
MGA 9
FSQ 13
Name: two, dtype: int32
one two
MZY 0 1
JLY 4 5
MGA 8 9
FSQ 12 13
one two three four
MZY True True True True
JLY True False False False
MGA False False False False
FSQ False False False False
one two three four
MZY 0 0 0 0
JLY 0 5 6 7
MGA 8 9 10 11
FSQ 12 13 14 15
2. 列的切片,行的索引,行的切片,切片索引混合
要借助iloc或loc函数。
(1)iloc函数
示例:
import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['MZY', 'JLY', 'MGA', 'FSQ'],
columns=['one', 'two', 'three', 'four'])
print(data)
'''注意:使用iloc方法的时候,两个关键变量都要是下标位置而不是列的名称'''
#行和列一起索引
print(data.iloc[2]) #这样写是对行的索引
print(data.iloc[2, 2])
print(data.iloc[[0, 1], [1, 2]])
#行和列一起切片
print(data.iloc[:1, :3])
print(data.iloc[2:, 1:])
print(data.iloc[:, :])
#行和列切片索引混合
print(data.iloc[:1, [0, 1, 2]])
结果:
one two three four
MZY 0 1 2 3
JLY 4 5 6 7
MGA 8 9 10 11
FSQ 12 13 14 15
one 8
two 9
three 10
four 11
Name: MGA, dtype: int32
10
two three
MZY 1 2
JLY 5 6
one two three
MZY 0 1 2
two three four
MGA 9 10 11
FSQ 13 14 15
one two three four
MZY 0 1 2 3
JLY 4 5 6 7
MGA 8 9 10 11
FSQ 12 13 14 15
one two three
MZY 0 1 2
注意:
使用iloc方法的时候,两个关键变量都要是下标位置,而不是行或列的名称
(2)loc函数
loc函数和iloc恰好相反,两个关键变量都要是行或列的名称,而不是下标位置
示例:
import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['MZY', 'JLY', 'MGA', 'FSQ'],
columns=['one', 'two', 'three', 'four'])
print(data)
'''使iloc方法的时候,两个关键变量都要是是列的名称而不是下标位置'''
#行列切片索引混合
print(data.loc['JLY', 'two'])
print(data.loc[:'JLY', 'two'])
print(data.loc[:'JLY', :'two'])
结果:
5
MZY 1
JLY 5
Name: two, dtype: int32
one two
MZY 0 1
JLY 4 5
具体分析见下图:
总结:
四、整数索引
为了保持一致性,如果有一个包含整数的轴索引,数据选择时使用标签索引。为了更精确地处理,可以使用loc或iloc
示例:
import pandas as pd
import numpy as np
obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
print(obj)
print(obj.loc[:'c'])
print(obj.loc['c'])
print(obj.iloc[:3])
print(obj.iloc[-1])
结果:
a 0.0
b 1.0
c 2.0
d 3.0
dtype: float64
a 0.0
b 1.0
c 2.0
dtype: float64
2.0
a 0.0
b 1.0
c 2.0
dtype: float64
3.0
总结
索引与切片无需严格的区分,在pandas中,目前来看,对于新手,建议使用loc或iloc进行。其具体用法总结如下: