1.定义DataFrame对象
data={
'id':[100,101,102,103],
'name':['a','b','c','g'],
'score':[78,90,66,24]
}
frame = pd.DataFrame(data=data)[获取全部字段]
frame = pd.DataFrame(data=data,index=['one','two','three','four'])[设置索引]
frame = pd.DataFrame(data=data,columns=['name','score'])[获取指定字段]
2.查看数据框信息
-[查看索引]:frame.index
-[查看字段名]:frame.columns
-[查看数据]:frame.values
-[查看维度]:frame.shape
-[查看列数据的类型]:frame.dtypes
-[查看某列唯一值]:frame[attrName].unique()
3.获取前(后)10条数据
-[前10条数据]:frame.head()
-[后10条数据]:frame.tail()
4.数据表清洗
-frame[attrName]=frame[attrName].fillna(value=0)[缺失值用0填充]
-frame[attrName]=frame[attrName].fillna(frame[attrName].mean())[缺失值用均值填充]
-frame[attrName]=frame[attrName].str.strip()[删除字符串结尾处的空白]
-frame[attrName]=frame[attrName].str.lower()/frame[attrName]=frame[attrName].str.upper()[字符串变大小写]
-frame[attrName]=frame[attrName].replace(str1,str2)[数据替换]
5.数据表合并
aa --> id name score
0 100 m1 66
1 102 m2 78
2 108 m3 90
bb --> name rank
0 m1 a
1 m2 b
2 m3 c
data = pd.merge(aa,bb,how='inner')[内连接:交集]
data = pd.merge(aa,bb,how='outer')[外连接:并集]
6.设置索引列
frame = id name score
0 100 m1 66
1 102 m2 78
2 108 m3 90
data = frame.set_index('id')[以选取的字段作为索引列]
data = name score
101 m1 66
102 m2 78
108 m3 90
7.按照特定字段排序
-frame.sort_values(by=[attrName])
8.按照条件添加字段
frame = id name score
0 100 m1 66
1 102 m2 78
2 108 m3 90
frame['rank']=np.where(frame['score']>85,'优秀','一般')
frame = id name score rank
0 100 m1 66 一般
1 102 m2 78 一般
2 108 m3 90 优秀
9.分拆字符串生成DataFrame
frame = id name score
0 100 m1 66
1 102 m2 78
2 108 m3 90
data = pd.DataFrame((str.split('-') for str in frame['name']),columns=['user','id'])
data = name id
m 1
m 2
data={
'id':[100,101,102,103],
'name':['a','b','c','g'],
'score':[78,90,66,24]
}
frame = pd.DataFrame(data=data)[获取全部字段]
frame = pd.DataFrame(data=data,index=['one','two','three','four'])[设置索引]
frame = pd.DataFrame(data=data,columns=['name','score'])[获取指定字段]
2.查看数据框信息
-[查看索引]:frame.index
-[查看字段名]:frame.columns
-[查看数据]:frame.values
-[查看维度]:frame.shape
-[查看列数据的类型]:frame.dtypes
-[查看某列唯一值]:frame[attrName].unique()
3.获取前(后)10条数据
-[前10条数据]:frame.head()
-[后10条数据]:frame.tail()
4.数据表清洗
-frame[attrName]=frame[attrName].fillna(value=0)[缺失值用0填充]
-frame[attrName]=frame[attrName].fillna(frame[attrName].mean())[缺失值用均值填充]
-frame[attrName]=frame[attrName].str.strip()[删除字符串结尾处的空白]
-frame[attrName]=frame[attrName].str.lower()/frame[attrName]=frame[attrName].str.upper()[字符串变大小写]
-frame[attrName]=frame[attrName].replace(str1,str2)[数据替换]
5.数据表合并
aa --> id name score
0 100 m1 66
1 102 m2 78
2 108 m3 90
bb --> name rank
0 m1 a
1 m2 b
2 m3 c
data = pd.merge(aa,bb,how='inner')[内连接:交集]
data = pd.merge(aa,bb,how='outer')[外连接:并集]
6.设置索引列
frame = id name score
0 100 m1 66
1 102 m2 78
2 108 m3 90
data = frame.set_index('id')[以选取的字段作为索引列]
data = name score
101 m1 66
102 m2 78
108 m3 90
7.按照特定字段排序
-frame.sort_values(by=[attrName])
8.按照条件添加字段
frame = id name score
0 100 m1 66
1 102 m2 78
2 108 m3 90
frame['rank']=np.where(frame['score']>85,'优秀','一般')
frame = id name score rank
0 100 m1 66 一般
1 102 m2 78 一般
2 108 m3 90 优秀
9.分拆字符串生成DataFrame
frame = id name score
0 100 m1 66
1 102 m2 78
2 108 m3 90
data = pd.DataFrame((str.split('-') for str in frame['name']),columns=['user','id'])
data = name id
m 1
m 2
m 3