Common functions of Python3NumPy

1. txt file

(1) The identity matrix, that is, a square matrix in which the elements on the main diagonal are all 1, and the rest of the elements are 0.
In NumPy, you can use the eye function to create such a two-dimensional array. We only need to give a parameter to specify the number of elements of 1 in the matrix.
For example, to create a 3-by-3 array:

import numpy as np
I2 = np.eye(3)
print(I2)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

(2) Use the savetxt function to store the data in a file, of course we need to specify the file name and the array to be saved.

np.savetxt('eye.txt', I2)#创建一个eye.txt文件，用于保存I2的数据

2. CSV file

CSV (Comma-Separated Value, comma-separated value) format is a common file format;
Usually, the dump file of the database is in CSV format, and each field in the file corresponds to the column in the database table;
Spreadsheet software such as Microsoft Excel can handle CSV files.

note: , the loadtxt function in NumPy can easily read CSV files, automatically split fields, and load data into NumPy arrays

Data content of data.csv:

c, v = np.loadtxt('data.csv', delimiter=',', usecols=(6,7), unpack=True)
# usecols的参数为一个元组，以获取第7字段至第8字段的数据
# unpack参数设置为True，意思是分拆存储不同列的数据，即分别将收盘价和成交量的数组赋值给变量c和v

print(c)

[336.1  339.32 345.03 344.32 343.44 346.5  351.88 355.2  358.16 354.54
 356.85 359.18 359.9  363.13 358.3  350.56 338.61 342.62 342.88 348.16
 353.21 349.31 352.12 359.56 360.   355.36 355.76 352.47 346.67 351.99]

print(v)

[21144800. 13473000. 15236800.  9242600. 14064100. 11494200. 17322100.
 13608500. 17240800. 33162400. 13127500. 11086200. 10149000. 17184100.
 18949000. 29144500. 31162200. 23994700. 17853500. 13572000. 14395400.
 16290300. 21521000. 17885200. 16188000. 19504300. 12718000. 16192700.
 18138800. 16824200.]

print(type(c))
print(type(v))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>

3. Volume Weighted Average Price = average() function

VWAP overview:
VWAP ( Volume-Weighted Average Price, volume-weighted average price ) is a very important economic quantity,
which represents the "average" price of financial assets.
The higher the volume of a price, the more weight that price has.
VWAP is a weighted average calculated with trading volume as the weight , and is often used in algorithmic trading.

vwap = np.average(c,weights=v)
print('成交量加权平均价格vwap =', vwap)

成交量加权平均价格vwap = 350.5895493532009

4. Arithmetic mean function = mean() function

The mean function in NumPy can calculate the arithmetic mean of the elements of an array

print('c数组中元素的算数平均值为： {}'.format(np.mean(c)))

c数组中元素的算数平均值为： 351.0376666666667

5. Time Weighted Average Price

TWAP overview:
In economics, TWAP (Time-Weighted Average Price) is another indicator of "average" prices. Now that we've calculated VWAP, let's also calculate TWAP. In fact, TWAP is just a variant. The basic idea is that the recent price is more important, so we should give a higher weight to the recent price . The easiest way is to use the arange function to create a sequence of natural numbers that increases sequentially from 0. The number of natural numbers is the number of closing prices. Of course, this is not necessarily the correct way to calculate TWAP.

t = np.arange(len(c))
print('时间加权平均价格twap=', np.average(c, weights=t))

时间加权平均价格twap= 352.4283218390804

6. Maximum and minimum values

h, l = np.loadtxt('data.csv', delimiter=',', usecols=(4,5), unpack=True)
print('h数据为： \n{}'.format(h))
print('-'*10)
print('l数据为： \n{}'.format(l))

h数据为： 
[344.4  340.04 345.65 345.25 344.24 346.7  353.25 355.52 359.   360.
 357.8  359.48 359.97 364.9  360.27 359.5  345.4  344.64 345.15 348.43
 355.05 355.72 354.35 359.79 360.29 361.67 357.4  354.76 349.77 352.32]
----------
l数据为： 
[333.53 334.3  340.98 343.55 338.55 343.51 347.64 352.15 354.87 348.
 353.54 356.71 357.55 360.5  356.52 349.52 337.72 338.61 338.37 344.8
 351.12 347.68 348.4  355.92 357.75 351.31 352.25 350.6  344.9  345.  ]

print('h数据的最大值为： {}'.format(np.max(h)))
print('l数据的最小值为： {}'.format(np.min(l)))

h数据的最大值为： 364.9
l数据的最小值为： 333.53

There is a ptp function in NumPy that can calculate the value range of an array
This function returns the difference between the maximum and minimum values of the array elements
That is, the return value is equal to max(array) - min(array)

print('h数据的最大值-最小值的差值为： \n{}'.format(np.ptp(h)))
print('l数据的最大值-最小值的差值为： \n{}'.format(np.ptp(l)))

h数据的最大值-最小值的差值为： 
24.859999999999957
l数据的最大值-最小值的差值为： 
26.970000000000027

7. Statistical analysis

Median:
We can use some threshold to remove outliers, but there is actually a better way, and that is the median.
Arrange the values of each variable in order of magnitude to form a sequence, and the number in the middle of the sequence is the median.
For example, if we have 5 numbers 1, 2, 3, 4, and 5, then the median is the number 3 in the middle.

m = np.loadtxt('data.csv', delimiter=',', usecols=(6,), unpack=True)
print('m数据中的中位数为： {}'.format(np.median(m)))

m数据中的中位数为： 352.055

# 数组排序后，查找中位数
sorted_m = np.msort(m)
print('m数据排序： \n{}'.format(sorted_m))
N = len(c)
print('m数据中的中位数为： {}'.format((sorted_m[N//2]+sorted_m[(N-1)//2])/2))

m数据排序： 
[336.1  338.61 339.32 342.62 342.88 343.44 344.32 345.03 346.5  346.67
 348.16 349.31 350.56 351.88 351.99 352.12 352.47 353.21 354.54 355.2
 355.36 355.76 356.85 358.16 358.3  359.18 359.56 359.9  360.   363.13]
m数据中的中位数为： 352.055

Variance:
Variance refers to the value obtained by dividing the sum of squared deviations of each data from the arithmetic mean of all data by the number of data.

print('variance =', np.var(m))

variance = 50.126517888888884

var_hand = np.mean((m-m.mean())**2)
print('var =', var_hand)

var = 50.126517888888884

Note: The difference in calculation between sample variance and population variance. The overall variance is the sum of squared deviations removed by the number of data, while the sample variance is the sum of squared deviations removed by the number of sample data minus 1, where the number of sample data minus 1 (ie n-1) is called degrees of freedom. The reason for this difference is to ensure that the sample variance is an unbiased estimator.

8. Stock Yield

In the academic literature, analysis of closing prices is often based on stock returns and logarithmic returns.
The simple rate of return refers to the rate of change between two adjacent prices, while the logarithmic rate of return refers to the difference between the logarithms of all prices.
We learned about logarithms in high school, and the logarithm of "a" minus the logarithm of "b" equals the logarithm of "a divided by b". Therefore, the logarithmic rate of return can also be used to measure the rate of change in prices.
Note that it is dimensionless since the yield is a ratio, for example we divide the dollar by the dollar (which can be in other currency units as well).
In conclusion, what investors are most interested in is the variance or standard deviation of returns, as this represents the magnitude of investment risk.

(1) First, let's calculate the simple rate of return. The diff function in NumPy returns an array of differences between adjacent array elements. This is somewhat similar to differentiation in calculus. To calculate the yield, we also need to divide the difference by the previous day's price. However, it should be noted here that the array returned by diff is one element less than the closing price array. returns = np.diff(arr)/arr[:-1]
Note that we did not divide by the last value in the closing price array. Next, use the std function to calculate the standard deviation:
print ("Standard deviation =", np.std(returns))

(2) The logarithmic rate of return is even simpler to calculate. We first use the log function to get the logarithm of each closing price, and then use the diff function on the result.
logreturns = np.diff( np.log(c) )
In general, we should check the input array to make sure it does not contain zeros and negative numbers. Otherwise, you will get an error message. In our case, however, the stock price is always positive, so the check can be omitted.

(3) We are likely to be very interested in which days yields are positive.
After completing the previous steps, we just need to use the where function to do this. The where function can return the index value of all array elements that satisfy the condition according to the specified condition.
Enter the following code:
posretindices = np.where(returns > 0)
print "Indices with positive returns", posretindices
can output the indices of all positive elements in the array.
Indices with positive returns (array([ 0, 1, 4, 5, 6, 7, 9, 10, 11, 12, 16, 17, 18, 19, 21, 22, 23, 25, 28]),)

(4) In investment science, volatility is a measure of price changes. Historical volatility can be calculated from historical price data. Logarithmic returns are required when calculating historical volatility, such as annual or monthly volatility. Annual volatility is equal to the standard deviation of the logarithmic rate of return divided by its mean, then divided by the square root of the reciprocal of the trading day, usually 252 trading days.
Use std and mean functions to calculate, the code is as follows:
annual_volatility = np.std(logreturns)/np.mean(logreturns)
annual_volatility = annual_volatility / np.sqrt(1./252.)

(5) The division operation in the sqrt function. In Python, the division of integers and floating-point numbers have different operation mechanisms (python3 has modified the function), and we have to use floating-point numbers to get correct results. Similar to how annual volatility is calculated, monthly volatility is calculated as follows:
annual_volatility * np.sqrt(1./12.)

c = np.loadtxt('data.csv', delimiter=',', usecols=(6,), unpack=True)

returns = np.diff(c)/c[:-1]
print('returns的标准差： {}'.format(np.std(returns)))
logreturns = np.diff(np.log(c))
posretindices = np.where(returns>0)
print('retruns中元素为正数的位置： \n{}'.format(posretindices))
annual_volatility = np.std(logreturns)/np.mean(logreturns)
annual_volatility = annual_volatility/np.sqrt(1/252)
print('每年波动率: {}'.format(annual_volatility))
print('每月波动率：{}'.format(annual_volatility*np.sqrt(1/12)))

returns的标准差： 0.012922134436826306
retruns中元素为正数的位置： 
(array([ 0,  1,  4,  5,  6,  7,  9, 10, 11, 12, 16, 17, 18, 19, 21, 22, 23,
       25, 28], dtype=int64),)
每年波动率: 129.27478991115132
每月波动率：37.318417377317765

This article refers to "Python Data Analysis Basic Tutorial: NumPy Learning Guide"