Python3pandas库Series用法(基础整理)

构造/初始化Series的3种方法:

(1)用列表list构建Series
(1.2)pandas会默认用0到n来做Series的index,但是我们也可以自己指定index,index你可以理解为dict里面的key
(2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
(3)用numpy array来构建Series

选择数据

(1)可以像对待一个list一样对待一个Series,完成各种切片的操作
(2)Series就像一个dict,前面定义的index就是用来选择数据的
(3)boolean indexing,和numpy很像

Series元素赋值

(1)直接利用索引值赋值
(2)不要忘了上面的boolean indexing,在赋值里它也可以用

数学运算

数据缺失


构造/初始化Series的3种方法:

(1)用列表list构建Series

import pandas as pd
my_list=[7,'Beijing','19大',3.1415,-10000,'Happy']
s=pd.Series(my_list)
print(type(s))
print(s)
  • 1
  • 2
  • 3
  • 4
  • 5
<class 'pandas.core.series.Series'>
0           7
1     Beijing
2        193      3.1415
4      -10000
5       Happy
dtype: object
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

(1.2)pandas会默认用0到n来做Series的index,但是我们也可以自己指定index,index你可以理解为dict里面的key

s=pd.Series([7,'Beijing','19大',3.1415,-10000,'Happy'],
index=['A','B','C','D','E','F'])
print(s)
  • 1
  • 2
  • 3
A           7
B     Beijing
C        19D      3.1415
E      -10000
F       Happy
dtype: object
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

(2)用字典dict来构建Series,因为Series本身其实就是key-value的结构

cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
  • 1
  • 2
  • 3
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

(3)用numpy array来构建Series

import numpy as np
d=pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
print(d)
  • 1
  • 2
  • 3
a   -0.329401
b   -0.435921
c   -0.232267
d   -0.846713
e   -0.406585
dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

选择数据

(1)可以像对待一个list一样对待一个Series,完成各种切片的操作

import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
  • 1
  • 2
  • 3
  • 4
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
print(apts[3])
  • 1
60000.0
  • 1
print(apts[[3,4,1]])
  • 1
Shanghai     60000.0
Suzhou           NaN
Guangzhou    45000.0
Name: income, dtype: float64
  • 1
  • 2
  • 3
  • 4
print(apts[1:])
  • 1
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
print(apts[:-2])
  • 1
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Name: income, dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
print(apts[1:]+apts[:-1])
  • 1
Beijing           NaN
Guangzhou     90000.0
Hangzhou      40000.0
Shanghai     120000.0
Suzhou            NaN
shenzhen          NaN
Name: income, dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

(2)Series就像一个dict,前面定义的index就是用来选择数据的

import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts['Shanghai']) ###
  • 1
  • 2
  • 3
  • 4
60000.0
  • 1
print('Hangzhou' in apts)
  • 1
True
  • 1
print('Choingqing' in apts)
  • 1
False
  • 1

(3)boolean indexing,和numpy很像

import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
less_than_50000=(apts<=50000) ###
print(apts[less_than_50000])
  • 1
  • 2
  • 3
  • 4
  • 5
Guangzhou    45000.0
Hangzhou     20000.0
shenzhen     50000.0
Name: income, dtype: float64
  • 1
  • 2
  • 3
  • 4

注:可以使用numpy的各种函数mean,median,max,min

print(apts.mean()) 
  • 1
46000.0
  • 1

Series元素赋值

(1)直接利用索引值赋值

import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
print('Old income of shenzhen:{}'.format(apts['shenzhen']))
  • 1
  • 2
  • 3
  • 4
  • 5
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     50000.0
Name: income, dtype: float64

Old income of shenzhen:50000.0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
apts['shenzhen']=70000  ###
print(apts)
print('New income of shenzhen:{}'.format(apts['shenzhen']))
  • 1
  • 2
  • 3
Beijing      55000.0
Guangzhou    45000.0
Hangzhou     20000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     70000.0
Name: income, dtype: float64

New income of shenzhen:70000.0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

(2)不要忘了上面的boolean indexing,在赋值里它也可以用

import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
print('New income of shenzhen:{}'.format(apts['shenzhen']))
less_than_50000=(apts<50000)  ###
print(less_than_50000)
apts[less_than_50000]=40000  ###
print(apts)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
Beijing      False
Guangzhou     True
Hangzhou      True
Shanghai     False
Suzhou       False
shenzhen     False
Name: income, dtype: bool

Beijing      55000.0
Guangzhou    40000.0
Hangzhou     40000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     70000.0
Name: income, dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

数学运算

import pandas as pd
import numpy as np
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
print('New income of shenzhen:{}'.format(apts['shenzhen']))
less_than_50000=(apts<50000)  
apts[less_than_50000]=40000  
print(apts)

print(apts/2)   ###
print(apts**1.5)   ###
print(np.log(apts))   ###
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})
print(apts2)
print(apts+apts2)   ###
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

数据缺失

cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
less_than_50000=(apts<50000)
apts[less_than_50000]=40000
print(apts)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
Beijing      55000.0
Guangzhou    40000.0
Hangzhou     40000.0
Shanghai     60000.0
Suzhou           NaN
shenzhen     70000.0
Name: income, dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})
print(apts2)
  • 1
  • 2
Beijing      10000
Chongqing    30000
Guangzhou     7000
Shanghai      8000
Tianjin      40000
shenzhen      6000
dtype: int64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
print('Hangzhou' in apts)   ###
print('Hangzhou' in apts2)
  • 1
  • 2
True
False
  • 1
  • 2
print(apts.notnull()) #boolean条件   ###
  • 1
Beijing       True
Guangzhou     True
Hangzhou      True
Shanghai      True
Suzhou       False
shenzhen      True
Name: income, dtype: bool
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
print(apts.isnull())   ###
  • 1
Beijing      False
Guangzhou    False
Hangzhou     False
Shanghai     False
Suzhou        True
shenzhen     False
Name: income, dtype: bool
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
print(apts[apts.isnull()])   #利用缺失索引布尔值取元素
  • 1
Suzhou   NaN
Name: income, dtype: float64
  • 1
  • 2
apts=apts+apts2   #索引缺失相加
print(apts)
  • 1
  • 2
Beijing      65000.0
Chongqing        NaN
Guangzhou    47000.0
Hangzhou         NaN
Shanghai     68000.0
Suzhou           NaN
Tianjin          NaN
shenzhen     76000.0
dtype: float64
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
apts[apts.isnull()]=apts.mean() #将缺失位置赋值为中值
print(apts)
  • 1
  • 2
Beijing      65000.0
Chongqing    64000.0
Guangzhou    47000.0
Hangzhou     64000.0
Shanghai     68000.0
Suzhou       64000.0
Tianjin      64000.0
shenzhen     76000.0
dtype: float64

猜你喜欢

转载自blog.csdn.net/kwame211/article/details/80421923