20 python data processing numpy

20 python data processing numpy

introduction

Using list in python is a waste of memory and time. numpy provides ndarray objects: ndarray is a multidimensional array that stores a single data type

Basic operation of ndarray array

  • N-dimensional object ndarray is used to store multidimensional arrays of the same type
  • Each element in the ndarray has an area of ​​the same storage size in memory
  • The data type object of each element in the ndarray is an object (called dtype)
  • Like other container objects in python, it can be sliced ​​or indexed through an array
  • The contents of ndarray can be accessed and modified through the methods and attributes of ndarray

ndarray creates an array

Using the array function, it accepts all serial objects, and then generates a numpy array containing the incoming data. Among them, the nested sequence (such as a list composed of a set of equal-length lists) will be converted into a multi-dimensional array

numpy.array(object,dtype=None,copy=True,order=None,subok=False,ndmin=0)

Object array or nested sequence
dtype Data type of array element (optional)
copy whether the object needs to be copied (optional)
order The style of creating the array C is the row direction F is the column direction A is any direction (default)
subok returns an and Array with the same base type
ndmin specifies the minimum dimension of the generated array

array function

import numpy as py
a=[1 ,2, 3, 4]
b=np.array(a)#即将列表装化为数组
c=np.array([1,2],[3,4]) #生成多维数组

ones和zeros

np.zeros(3) #全0的一维数组
np.ones(3) #全部为1的一维数组
np.zeros((3,)) #全0的3*1二维数组
np.identity(3)#单位矩阵,3*3
np.arrage(10) #生成0到9的数组,共10个数

Random array creation

Evenly distributed
np.random.rand(10,10) #10*10的二维随机数组,范围0到1    [0,1)
np.random.uniform(0,100) 创建指定范围内的一个数
np.random.randint(0,10,5) 创建指定范围的5个整数
np.random.randint(0,10,(3,5)) 3*515个随机整数组成的矩阵,位于010 
Normal distribution
np.random.normal(1.75,0.1,(2,3)) 参数分别为:给定的均值 标准差 维度的正态分布
np.random.standard_normal(5) #从标准正态分布中随机采样5个数字

ndarray array attributes

b.size 元素的个数
b.shape 数组的形状
b.ndim 数组的纬度
b.dtype数组元素类型
b.Itemsize 数组元素字节大小
b.reshape(2,3) #把数组改为2*3的

Operations between arrays and scalars

Array is very important, it allows us to operate on data without writing loops, this is called vectorization

  • Any operation between arrays of equal size will apply the operation to the corresponding elements
  • Arithmetic operations between arrays and scalars will propagate the scalar to each element

Index and slice

One-dimensional arrays and lists the most important difference is that the array slice is a view of the original data, which means that data is not copied , any changes will reflect directly on the array of views on the original array
to copy a scalar to a time slice, The changed value will be automatically propagated to the entire selection

arr[5:8]=12                   #5 ,6, 7 位置都会变成12

In a two-dimensional array, the element at each index position is no longer a scalar but a one-dimensional array
. Recursive access to each element is possible, but this is a bit troublesome and a
better way is to pass in a comma-separated list of indexes to select a single Element
In a multidimensional array, if the following index is omitted, the returned object is an ndarray with a lower latitude

Mathematical Statistics

sum sum all the elements in the array or a certain axis the grandchild of the zero-length array = 0
mean arithmetic mean
std var standard deviation
min max argmin argmax
the index of the largest element and the smallest element
cumsum the accumulation of
all elements cumprod the product of all elements is
used When, the following two methods are available:
arr.mean()
np.mean(arr)
mean and sum functions can accept an axis parameter, which is used to calculate the statistical value on the
axis arr.mean(axis=1) For two-dimensional Array 0 is calculated by row and 1 is calculated by column.
cumsum: Returns the trapezoidal cumulative sum of the elements according to the given axis parameter, axis=0, accumulates according to the row, axis=1, accumulates according to the column, cumprod is changed to accumulation in the same way

Guess you like

Origin blog.csdn.net/bj_zhb/article/details/105429298