Article Directory
20 python data processing numpy
introduction
Using list in python is a waste of memory and time. numpy provides ndarray objects: ndarray is a multidimensional array that stores a single data type
Basic operation of ndarray array
- N-dimensional object ndarray is used to store multidimensional arrays of the same type
- Each element in the ndarray has an area of the same storage size in memory
- The data type object of each element in the ndarray is an object (called dtype)
- Like other container objects in python, it can be sliced or indexed through an array
- The contents of ndarray can be accessed and modified through the methods and attributes of ndarray
ndarray creates an array
Using the array function, it accepts all serial objects, and then generates a numpy array containing the incoming data. Among them, the nested sequence (such as a list composed of a set of equal-length lists) will be converted into a multi-dimensional array
numpy.array(object,dtype=None,copy=True,order=None,subok=False,ndmin=0)
Object array or nested sequence
dtype Data type of array element (optional)
copy whether the object needs to be copied (optional)
order The style of creating the array C is the row direction F is the column direction A is any direction (default)
subok returns an and Array with the same base type
ndmin specifies the minimum dimension of the generated array
array function
import numpy as py
a=[1 ,2, 3, 4]
b=np.array(a)#即将列表装化为数组
c=np.array([1,2],[3,4]) #生成多维数组
ones和zeros
np.zeros(3) #全0的一维数组
np.ones(3) #全部为1的一维数组
np.zeros((3,)) #全0的3*1二维数组
np.identity(3)#单位矩阵,3*3
np.arrage(10) #生成0到9的数组,共10个数
Random array creation
Evenly distributed
np.random.rand(10,10) #10*10的二维随机数组,范围0到1 [0,1)
np.random.uniform(0,100) 创建指定范围内的一个数
np.random.randint(0,10,5) 创建指定范围的5个整数
np.random.randint(0,10,(3,5)) 3*5的15个随机整数组成的矩阵,位于0到10
Normal distribution
np.random.normal(1.75,0.1,(2,3)) 参数分别为:给定的均值 标准差 维度的正态分布
np.random.standard_normal(5) #从标准正态分布中随机采样5个数字
ndarray array attributes
b.size 元素的个数
b.shape 数组的形状
b.ndim 数组的纬度
b.dtype数组元素类型
b.Itemsize 数组元素字节大小
b.reshape(2,3) #把数组改为2*3的
Operations between arrays and scalars
Array is very important, it allows us to operate on data without writing loops, this is called vectorization
- Any operation between arrays of equal size will apply the operation to the corresponding elements
- Arithmetic operations between arrays and scalars will propagate the scalar to each element
Index and slice
One-dimensional arrays and lists the most important difference is that the array slice is a view of the original data, which means that data is not copied , any changes will reflect directly on the array of views on the original array
to copy a scalar to a time slice, The changed value will be automatically propagated to the entire selection
arr[5:8]=12 #5 ,6, 7 位置都会变成12
In a two-dimensional array, the element at each index position is no longer a scalar but a one-dimensional array
. Recursive access to each element is possible, but this is a bit troublesome and a
better way is to pass in a comma-separated list of indexes to select a single Element
In a multidimensional array, if the following index is omitted, the returned object is an ndarray with a lower latitude
Mathematical Statistics
sum sum all the elements in the array or a certain axis the grandchild of the zero-length array = 0
mean arithmetic mean
std var standard deviation
min max argmin argmax
the index of the largest element and the smallest element
cumsum the accumulation of
all elements cumprod the product of all elements is
used When, the following two methods are available:
arr.mean()
np.mean(arr)
mean and sum functions can accept an axis parameter, which is used to calculate the statistical value on the
axis arr.mean(axis=1) For two-dimensional Array 0 is calculated by row and 1 is calculated by column.
cumsum: Returns the trapezoidal cumulative sum of the elements according to the given axis parameter, axis=0, accumulates according to the row, axis=1, accumulates according to the column, cumprod is changed to accumulation in the same way