Cris 的 Python 数据分析笔记 07:Pandas 中的 Series 数据结构

版权声明:转载请注明出处~ 摸摸博主狗头 https://blog.csdn.net/cris_zz/article/details/84336171

# Series (Collection of values)
# DataFrame (Collection of Series Objects)
'''
    Pandas 读取的 csv 文件会形成一个 DataFrame 数据结构,其中的每行就是一个 Series 数据结构,每列同样是一个 Series 数据结构
'''

1. DataFrame 和 Series 关系

import pandas as pd

fandago = pd.read_csv('fandango_score_comparison.csv')
print(type(fandago))
series_film = fandago['FILM']
print(type(series_film))
print(series_film[0:5])
print('---------------')
print(fandago['RottenTomatoes'][0:5])
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
0    Avengers: Age of Ultron (2015)
1                 Cinderella (2015)
2                    Ant-Man (2015)
3            Do You Believe? (2015)
4     Hot Tub Time Machine 2 (2015)
Name: FILM, dtype: object
---------------
0    74
1    85
2    80
3    18
4    14
Name: RottenTomatoes, dtype: int64

2. 新建 Series 数据结构(key 和 value)

from pandas import Series

# 获取 Series 数据结构的值(ndarray 类型)
values = series_film.values
print(values[0:5])
print(type(values))

# 获取索引对象
index = series_film.index
print(index)

# 每部电影的名字做作为索引,‘烂番茄’ 网站对每部电影的评分作为值,以此新建一个 Series 数据结构
series_score = fandago['RottenTomatoes']
new_series = Series(series_score.values, index=series_film)
print(new_series.head())
print(new_series.index[0:3])
# 根据索引(string 类型)取值
result = new_series[['Cinderella (2015)', 'Avengers: Age of Ultron (2015)']]
print(result)
['Avengers: Age of Ultron (2015)' 'Cinderella (2015)' 'Ant-Man (2015)'
 'Do You Believe? (2015)' 'Hot Tub Time Machine 2 (2015)']
<class 'numpy.ndarray'>
RangeIndex(start=0, stop=146, step=1)
FILM
Avengers: Age of Ultron (2015)    74
Cinderella (2015)                 85
Ant-Man (2015)                    80
Do You Believe? (2015)            18
Hot Tub Time Machine 2 (2015)     14
dtype: int64
Index(['Avengers: Age of Ultron (2015)', 'Cinderella (2015)',
       'Ant-Man (2015)'],
      dtype='object', name='FILM')
FILM
Cinderella (2015)                 85
Avengers: Age of Ultron (2015)    74
dtype: int64

3. Series 的排序

result = new_series.sort_index()
print(result.head())
result = new_series.sort_values()
print(result.tail())
Index(['Avengers: Age of Ultron (2015)', 'Cinderella (2015)',
       'Ant-Man (2015)'],
      dtype='object', name='FILM')
FILM
'71 (2015)                    97
5 Flights Up (2015)           52
A Little Chaos (2015)         40
A Most Violent Year (2014)    90
About Elly (2015)             97
dtype: int64
FILM
Song of the Sea (2014)                        99
Phoenix (2015)                                99
Selma (2014)                                  99
Seymour: An Introduction (2015)              100
Gett: The Trial of Viviane Amsalem (2015)    100
dtype: int64

4. 区间求值

print(new_series.index[0:3])
print(new_series[new_series.values > 90])
smaller_values = new_series.values > 80
bigger_values = new_series.values < 90
print(new_series[smaller_values & bigger_values])
Index(['Avengers: Age of Ultron (2015)', 'Cinderella (2015)',
       'Ant-Man (2015)'],
      dtype='object', name='FILM')
FILM
Shaun the Sheep Movie (2015)                    99
Leviathan (2014)                                99
Selma (2014)                                    99
Ex Machina (2015)                               92
Wild Tales (2014)                               96
The End of the Tour (2015)                      92
Red Army (2015)                                 96
The Hunting Ground (2015)                       92
I'll See You In My Dreams (2015)                94
Timbuktu (2015)                                 99
About Elly (2015)                               97
The Diary of a Teenage Girl (2015)              95
Birdman (2014)                                  92
The Gift (2015)                                 93
Monkey Kingdom (2015)                           94
Mr. Turner (2014)                               98
Seymour: An Introduction (2015)                100
The Wrecking Crew (2015)                        93
Mad Max: Fury Road (2015)                       97
Spy (2015)                                      93
Paddington (2015)                               98
What We Do in the Shadows (2015)                96
The Salt of the Earth (2015)                    96
Song of the Sea (2014)                          99
It Follows (2015)                               96
Phoenix (2015)                                  99
Tangerine (2015)                                95
Mission: Impossible – Rogue Nation (2015)     92
Amy (2015)                                      97
Inside Out (2015)                               98
'71 (2015)                                      97
Two Days, One Night (2014)                      97
Gett: The Trial of Viviane Amsalem (2015)      100
dtype: int64
FILM
Cinderella (2015)                        85
Top Five (2014)                          86
Love & Mercy (2015)                      89
Far From The Madding Crowd (2015)        84
Black Sea (2015)                         82
Trainwreck (2015)                        85
Still Alice (2015)                       88
When Marnie Was There (2015)             89
Furious 7 (2015)                         81
Me and Earl and The Dying Girl (2015)    81
Dope (2015)                              87
The Overnight (2015)                     82
While We're Young (2015)                 83
Clouds of Sils Maria (2015)              89
Testament of Youth (2015)                81
The Wolfpack (2015)                      84
The Stanford Prison Experiment (2015)    84
Mr. Holmes (2015)                        87
Kumiko, The Treasure Hunter (2015)       87
dtype: int64

5. 根据 index 求不同媒体对相同电影评分的均值

score_critics = Series(fandago['RottenTomatoes'].values, index=fandago['FILM'])
score_users = Series(fandago['RottenTomatoes_User'].values,index=fandago['FILM'])
avg_score = (score_critics+score_users)/2
print(avg_score.head())
FILM
Avengers: Age of Ultron (2015)    80.0
Cinderella (2015)                 82.5
Ant-Man (2015)                    85.0
Do You Believe? (2015)            51.0
Hot Tub Time Machine 2 (2015)     21.0
dtype: float64

猜你喜欢

转载自blog.csdn.net/cris_zz/article/details/84336171
今日推荐