A, numpy module
About 1.1 numpy
numpy python is an open source numerical calculation extension library, which library can be used to store and process large numpy array.
numpy library has two functions:
1, different from the list list, offers an array of operations, array operations, as well as the statistical distribution and simple mathematical model
2, calculation speed, even better than a simple arithmetic python built, so that it becomes dependencies pandas, sklearn uniformly. Advanced frameworks such TensorFlow, PyTorch the like, and also numpy array operations are very similar.
1.2 create numpy arrays
That numpy array numpy of ndarray object, creating a list of numpy arrays is to pass np.array () method
import numpy as np
# 创建一维的ndarray对象
arr =np.array([1,2,3])
print(arr,type(arr)) # [1 2 3] <class 'numpy.ndarray'
# 创建二维的ndarray对象
print(np.array([[1,2,3],[4,5,6]]))
--------------------------------------------------------------------------------
[[1 2 3]
[4 5 6]]
1.3 numpy array of common properties
Attributes | Explanation |
---|---|
T | Transpose of the array (in terms of high dimensional array) |
dtype | Data type of the array element |
size | The number of array elements |
help | Dimensions of the array |
shape | Dimensions size of the array (in the form of tuples) |
astype | Type Conversion |
arr = np.array([[1,2,3],[4,5,6]])
print(arr.T) # 行与列互换
--------------------------------------------------------------------------------
[[1 4]
[2 5]
[3 6]]
1.4 slices
arr = np .array([[1,2,3],[4,5,6]])
print(arr[:]) # 取出数组所有元素
print(arr[:,:]) # 取出数组所有元素
print(arr[0,:]) # 取出第0行到第一行的数组
print(arr[0:1,:]) # 取出第0行到第一行的数组,顾头不顾尾
print(arr[0:1,0:1]) # 取出第0行到第一行,第0列到第一列的数组,顾头不顾尾
print(arr[0, 0],type(arr[0, 0])) #取出第0行到第一行,第0列到第一列的数,输出数组类型
print(arr[0, [0,2]]) #取出第0行第0个元素和第2个元素 [1 3]
print(arr[0, 0] + 1) #取出第0行第0列的元素加1 2
1.5 Value
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr[0, :] = 0 #将第0行的元素全部变为0
print(arr)
--------------------------------------------
[[0 0 0]
[4 5 6]]
arr[1, 1] = 1 #将第一行第一列的数字改为1
print(arr)
--------------------------------------------------------------------------------
[[0 0 0]
[4 1 6]]
arr[arr < 3] = 3 # 布尔取值 将小于3的数字全部变为3
print(arr)
--------------------------------------------------------------------------------
[[3 3 3]
[4 3 6]]
1.6 Merge
arr1 = np.array([[1, 2, 3], [4, 5, 6]]) # 可变数据类型
arr2 = np.array([[7, 8, 9], [10, 11, 12]]) # 可变数据类型
-------------------------------------------------------
[[1 2 3]
[4 5 6]]
[[ 7 8 9]
[10 11 12]]
print(np.hstack((arr1,arr2))) # 行合并
------------------------------------------------------------
[[ 1 2 3 7 8 9]
[ 4 5 6 10 11 12]]
print(np.vstack((arr1,arr2))) # 列合并
------------------------------------------------
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
print(np.concatenate((arr1, arr2))) # 默认列合并
print(np.concatenate((arr1, arr2),axis=1)) # 1表示行;0表示列
-----------------------------------------------------------
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[[ 1 2 3 7 8 9]
[ 4 5 6 10 11 12]]
1.7 create numpy array by function
method | Detailed |
---|---|
array() | Converting an array list, select explicitly specified dtype |
arange() | range of numpy version, support for floating-point numbers |
linspace() | Similarly arange (), the third parameter is the length of the array |
zeros() | 0 np.zeros create a full array according to a predetermined shape and dtype ((5, 5)) |
ones() | Np.ones create a full array 1 according to a predetermined shape and dtype ((5, 5)) |
eye() | Create a matrix (figure 1 on the diagonal) |
empty() | Creating a full array of random elements |
reshape() | Reshape |
1.7 numpy array operations
Operators | Explanation |
---|---|
+ | Adding the corresponding elements of two arrays numpy |
- | Two subtracting corresponding elements of the array numpy |
* | Multiplying corresponding elements of two arrays numpy |
/ | Numpy array elements corresponding to two division, if the take are integers supplier |
% | Numpy corresponding elements of two arrays take the remainder after division |
**n | Each individual array elements take numpy n-th power, such as 2 **: squaring each element |
1.9 additional understanding
numpy random number
print(np.random.rand(3,4)) #随机生成一个3*4的数组
print(np.random.randint(1,10,(3,4))) # 最小值1,最大值10,3*4
print(np.random.choice([1,2,3,4,5],3)) #随机生成一个元素为3个的数组,数组元素在[1,2,3,4,5]内
Emphasis
Random number seed: All the random number is generated in a random number seed
The short time constant, time becomes longer
np.random.seed(int(time.time()))
np.random.seed(1) #如果固定了就不会变
arr1 = np.random.rand(3,4) # 可变数据类型
print(arr1)
rs = np.random.RandomState(1) #产生一个随机状态种子,seed为1
print(rs.rand(3,4))
---------------------------------------------------------
[[4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01]
[1.46755891e-01 9.23385948e-02 1.86260211e-01 3.45560727e-01]
[3.96767474e-01 5.38816734e-01 4.19194514e-01 6.85219500e-01]]
Two, pandas module
1, the import mode
import pandas as pd
2, the role of
For document processing, more is done to excel file processing module numpy + xlrd made of one layer of encapsulation
3, pandas module data type
3.1 series()
Now generally do not use (one-dimensional)
df = pd.series(np.array([1,2,3,4]))
print(df)
3.2 DataFrame () (multi-dimensional)
3.2.1
dates = pd.date_range('20190101', periods=6, freq='M')
print(dates) # periods=6, freq='M'代表前六个月
start | Starting time |
---|---|
end | End Time |
periods | length of time |
freq | Temporal frequency, the default is 'D', the optional H (our), W (eek), B (usiness), S (emi-) M (onth), (min) T (es), S (econd), A (year), ... |
3.2.2 Properties
Attributes | Detailed |
---|---|
dtype is | View Data Types |
index | Check sequence of rows or index |
columns | Look at the label of each column |
values | See data frame data, i.e., the index data contain header |
describe | See extremum data, the mean, the median of each column, only numeric data can be used |
transpose | Transpose, it can also be used to operate the T |
sort_index | Sorting, sorting by row or column index output |
sort_values | Sorting the data values according to |
3.2.3 Value
#构造一个数组
dates = pd.date_range('20190101', periods=6, freq='M')
print(dates)
values = np.random.rand(6, 4) * 10
print(values)
columns = ['c4','c2','c3','c1']
#主要掌握
df.values[1,1] #取出第一行第一列
df.iloc[1,1] = 1 #取出第一行第一列,替换为1
3.2.4 operating table
1, missing values
df = df.dropna(axis = 0) #按行删除缺失值
df
df = df.dropna(tresh = 4) #必须得有4个值,写5就不可以,因为只有4列
df = df.dropna(axis=0) # 1列,0行
df #按行取缺失值
2, the data merging processing
df1 = pd.DataFrame(np.zeros((2,3))) #用0合并两行三列
df2 = pd.DataFrame(np.ones((2,3))) #用1合并两行三列
pd.concat((df1,df2)) #默认按列合并
pd.concat((df1,df2),axis=1) axis=1是行,0是列
df1.append(df2) #往后追加
- Import data, read json file only rookie to do to understand