1、Numpy

逻辑运算：

import numpy as np

stock_change = np.random.standard_normal((8,5))

# 逻辑判断，如果涨幅大于0.5就标记为True，否则为False
stock_change > 0.5

# bool索引
temp = stock_change > 0.5
stock_change[temp]   # 取出temp中为True

# 判断stock_change[0:2, 0:5]是否是全部上涨的
np.all(stock_change[0:2, 0:5] > 0)

# 判断前5只股票这段期间是否有上涨的
np.any(stock_change[:5, :] > 0)

# 判断前五只股票这四天的涨跌幅，大于0置为1，否则置为0
temp = stock_change[:5, :4] > 0
np.where(temp, 1, 0)

# 判断前五只股票这四天的涨跌幅，大于0.5且小于1置为1，否则置为0
temp = np.logical_and(stock_change[:5, :4] > 0.5, stock_change[:5, :4] < 1)
np.where(temp, 1, 0)

统计运算：

import numpy as np

stock_change = np.random.standard_normal((8,5))
temp = stock_change[:4, :4]

# 求每一行的最小值
np.min(temp, axis=1)

# 求每一行的最大值
np.max(temp, axis=1)

# 求每一行的中位数
np.median(temp, axis=1)

# 返回每一行的最大值的下标
np.argmax(temp, axis=1)

# 求每一行的平均数
np.mean(temp, axis=1)

# 求每一行的标准差
np.std(temp, axis=1)

# 求每一行的方差
np.var(temp, axis=1)

数组与数组的运算：

import numpy as np

arr = np.array([[1,2,3,4,5],[6,7,8,9,10]])

# 数组与标量的运算
arr + 1

# 同形状数组加减乘除
arr_2 = np.linspace(1,12,12).reshape((2,6))

矩阵乘法：

import numpy as np

score = [[80, 86],
        [86, 78],
        [90, 90],
        [79, 95],
        [86, 88],
        [96, 98]]
score = np.array(score)
w = np.array([[0.3],[0.7]])
np.matmul(score, w)
np.dot(score, w)   # 可以进行矩阵和标量的乘法

2、Pandas

为什么使用Pandas：

便捷的数据处理能力；

读取文件方便；

封装了Matplotlib、Numpy的画图和计算。

生成数据：

import numpy as np
import pandas as pd

stock_change = np.random.standard_normal((8,10))
data = pd.DataFrame(stock_change)

# 生成股票名字列表
codes = ['股票' + str(i) for i in range(1,11)]
# 生成日期
date = pd.date_range('20191203', periods=8, freq='B')
# 替换行和列的名称
data = pd.DataFrame(stock_change, index=date, columns=codes)

DataFrame的属性：

# 获取数据
data.values

# 获取行索引
data.index

# 获取列索引
data.columns

# 获取形状
data.shape

# 转置
data.T

# 查看前3行
data.head(3)

# 查看最后5行
data.tail()

# 通过索引查看名称
data.index[3]

# 替换旧索引
new_index = pd.date_range('20200103', periods=8, freq='B')
data.index = new_index

# 重置索引
data.reset_index(drop=True)

# 把某一列设置为新索引
df = pd.DataFrame({'month':[1,4,7,10],
                  'year':[2015,2016,2017,2018],
                  'sale':[10,20,30,40]})
res = df.set_index(['month', 'year'], drop=False)

创建Series：

pd.Series([1,2,3,4,5], index=[1,2,3,4,5])

基本数据操作：

# 通过中括号索引   # 先列后行
data['股票1']['2020-01-03']

# 通过loc索引   索引名字
data.loc['2020-01-03','股票3']

# 通过iloc索引   索引下标
data.iloc[2,2:]

# 赋值操作
data['股票3'] = 5
data['股票6'] = 8

排序：

# 按照数值排序
data.sort_values(by='股票1', ascending=True)

# 按照索引排序
data.sort_index(ascending=False)

算数运算：

data['股票1'].add(data['股票6'])

data['股票1'].add(1)

逻辑运算：

# 布尔索引
temp = data['股票1'] > 0
data[temp]

# 完成一个多个逻辑判断，筛选 '股票1' > 0  并且 股票7 < 0.5
data[(data['股票1'] > 0) &  (data['股票7'] < 0.5)]

# 使用query实现获取满足条件的样本
data.query("股票1 > 0 & 股票3 < 0")

# 取出股票3 = 5或= 5的所有样本
data['股票3'].isin([5,8])

统计运算：

data.describe()

# 求最大值
data['股票1'].max()

# 累计统计函数
# 统计股票累计变化值
res = data['股票7'].cumsum()

# 绘制折线图
res.plot()
plt.title('股票价格变化图')
plt.show()

自定义运算：

def fun_1(x):
    return x.max() - x.min()
data[['股票1','股票2']].apply(fun_1,axis=0)

文件读取和存储：

import pandas as pd

# 读取csv数据
data_csv= pd.read_csv('./data/stock_day.csv',usecols=['volume','low'])
# 保存csv数据
data_csv.to_csv('./test_csv.csv',index=False)   # index=False 不保存索引

# 读取hdf5数据
data_hdf = pd.read_hdf('./data/stock_data/day/day_eps_ttm.h5')
# 保存hdf5数据
data_hdf.to_hdf('./test_h5.h5',key='test')

# 读取json文件
data_json = pd.read_json('./data/Sarcasm_Headlines_Dataset.json',orient='records',lines=True)
# 保存json数据
data_json.to_json('./test_json.json',orient='records',lines=False)

马尔盖云

发布了40 篇原创文章 · 获赞 53 · 访问量 1万+

私信关注

Python人工智能高级68（numpy，pandas）

1、Numpy

逻辑运算：

统计运算：

数组与数组的运算：

矩阵乘法：

2、Pandas

为什么使用Pandas：

生成数据：

DataFrame的属性：

创建Series：

基本数据操作：

排序：

算数运算：

逻辑运算：

统计运算：

自定义运算：

文件读取和存储：

猜你喜欢

Python人工智能高级68（numpy，pandas）

1、Numpy

逻辑运算：

统计运算：

数组与数组的运算：

矩阵乘法：

2、Pandas

为什么使用Pandas：

生成数据：

DataFrame的属性：

创建Series：

基本数据操作：

排序 ：

算数运算：

逻辑运算：

统计运算：

自定义运算：

文件读取和存储：

猜你喜欢

排序：