[机器学习]第二章Numpy

最近在学习机器学习的课程,想把整个学习流程的笔记记录在这里,大家可以作为一个查阅基本概念的地方,由于学识尚浅肯定会有或多或少的问题,欢迎大家在评论中提出来一起讨论。Numpy作为机器学习的一个必不可少的基础库非常重要,我把Numpy的基本操作概括了以下19点,欢迎大家查阅。

1.numpy array的创建

#生成一维列表
in:nparr = np.array([i for i in range(10)])
   nparr
out:array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#返回列表的类型
in:nparr.dtype
out:dtype('int32')

#生成全为0的列表
in:np.zeros(10)
out:array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

#生成全为0的矩阵
in:np.zeros((3,5))
out:array([[0., 0., 0., 0., 0.],
            [0., 0., 0., 0., 0.],
            [0., 0., 0., 0., 0.]])

#生成全为1的矩阵
in:np.ones((3,5))
out:array([[1., 1., 1., 1., 1.],
		    [1., 1., 1., 1., 1.],
		    [1., 1., 1., 1., 1.]])

#用一个数字充满矩阵
in:np.full((3,5),fill_value=666)
out:array([[666, 666, 666, 666, 666],
           [666, 666, 666, 666, 666],
           [666, 666, 666, 666, 666]])

2.arrage

#生成0-20的数,间隔为2
in:np.arange(0,20,2)
out:array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

#生成0-10
in:np.arange(0,10)
out:array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

3.linspace

#生成0-20 首数为0,尾数为20,等间隔的十个数
in:np.linspace(0,20,10)
out:array([0.,2.22222222,4.44444444,6.66666667,8.88888889,11.11111111, 13.33333333, 15.55555556, 17.77777778, 20.])

#同上 生成11个数
in:np.linspace(0,20,11)
out:array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18., 20.])

4.random

#随机生成0-10之间的一个整数
in:np.random.randint(0,10)
out:2

#随机生成0-10之间的10个整数
in:np.random.randint(0,10,size=10)
out:array([9, 2, 2, 4, 0, 8, 6, 7, 6, 6])

#随机生成0-10之间的3*5的矩阵
in:np.random.randint(0,10,size=(3,5))
out:array([[4, 7, 9, 3, 6],
            [2, 2, 3, 8, 1],
            [6, 5, 5, 9, 0]])

#可以指定随机种子使前后两次生成的随机矩阵相同
in:np.random.seed(666)
	np.random.randint(0,10,size=(3,5))
out:array([[2, 6, 9, 4, 3],
       		[1, 0, 8, 7, 5],
      	    [2, 5, 5, 4, 8]])

#随机生成-1~1之间的浮点数
in:np.random.random()
out;0.7315955468480113

#生成均值为0,标准差为1,size=3*5的矩阵
in:np.random.normal(0,1,size=(3,5))
out:array([[-1.62879004,  1.23174866, -0.91360034, -0.27084407,  1.42024914],
     	    [-0.98226439,  0.80976498,  1.85205227,  1.67819021, -0.98076924],
      	    [ 0.47031082,  0.18226991, -0.84388249,  0.20996833,  0.22958666]])

5.reshape

in:x = np.arange(10)
   x
out:array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

in:X = np.arange(15).reshape(3,5)
   X
out:array([[ 0,  1,  2,  3,  4],
           [ 5,  6,  7,  8,  9],
           [10, 11, 12, 13, 14]])

#reshape可以传一个参数-1,下例默认改变x的列数对应x的行数,即两行对应5列
in:x.reshape(2,-1)
out:array([[0, 1, 2, 3, 4],
            [5, 6, 7, 8, 9]])

6.基本属性

#返回维度(一个数字)
in:x.ndim
out:1
in:X.ndim
out:2

#返回各维度大小的元祖
in:x.shape
out:(10,)
in:X.shape
out:(3, 5)

#返回矩阵元素个数
in:x.size
out:10
in:X.size
out:15

7.数据的访问

#首先看看我们的x和X
x=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
X=array([[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9],
         [10, 11, 12, 13, 14]])

#先看对于一维数组的访问(即x)
in:x[0]
out:0

in:x[-1]
out:9

in:x[0:5]
out:array([0, 1, 2, 3, 4])

in:[::2]
out:array([0, 2, 4, 6, 8])

in:x[::-1]
out:array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

#这些都是一些比较基本的操作,大家应该都能看懂,接下来看二维矩阵的操作(即X)

#前两行前三列
in:X[:2,:3]
out:array([[0, 1, 2],
            [5, 6, 7]])

#前两行,隔一行取一列
in:X[:2,::2]
out:array([[0, 2, 4],
            [5, 7, 9]])

#全部倒序
in:X[::-1,::-1]
out:array([[14, 13, 12, 11, 10],
            [ 9,  8,  7,  6,  5],
            [ 4,  3,  2,  1,  0]])

#X后的中括号带一个数字,代表取一列
in:X[0]
out:array([0, 1, 2, 3, 4])
in:X[0,:]
out:array([0, 1, 2, 3, 4])

8.浅拷贝和深拷贝

#首先看一下浅拷贝
in:subX = X[:2,:3]
    subX
out:array([[0, 1, 2],
            [5, 6, 7]])

#当我们修改某个元素时
in:subX[0,0]=100 
    subX
out:array([[100,   1,   2],
            [  5,   6,   7]])
            
#同时访问X,发现X的数据也跟随修改subX时一起改变了
in:X
out:array([[100,   1,   2,   3,   4],
           [  5,   6,   7,   8,   9],
           [ 10,  11,  12,  13,  14]])

#如何做到深拷贝呢
in:subX = X[:2,:3].copy()
    subX
out:array([[0, 1, 2],
            [5, 6, 7]])
#这时候subX就是一个新的矩阵了,修改他的时候就不会影响我们原来的X了。

9.合并操作

in:x = np.array([1,2,3])
   y = np.array([3,2,1])
   x
   y
out:array([1, 2, 3])
	array([3, 2, 1])

#首先我们完成纵向的堆叠
in:z = np.vstack([x,y])	
out:array([[1, 2, 3],
           [3, 2, 1]])

#然后完成横向的堆叠
in: w = np.hstack([x,y])
out:array([1, 2, 3, 3, 2, 1])

10.分割操作

in:A = np.arange(16).reshape((4,4))
   A
out:array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11],
           [12, 13, 14, 15]])

#首先对A进行纵向的分割
in:upper,lower = np.vsplit(A,[2])
    upper
out:array([[0, 1, 2, 3],
            [4, 5, 6, 7]])
#这里传的2代表‘刀口’的位置,即竖着数在第3个位置开刀,当然也可以传-1代表从倒数第二个位置进行切割

#然后对A进行横向的分割
in:upper,lower = np.hsplit(A,[2])
    upper
out:array([[ 0,  1],
            [ 4,  5],
            [ 8,  9],
            [12, 13]])

11.numpy.array中的运算(也称作Universal Functions)

#首先生成我们一个X
in:X= np.arange(1,16).reshape((3,5))
    X
out:array([[ 1,  2,  3,  4,  5],
           [ 6,  7,  8,  9, 10],
           [11, 12, 13, 14, 15]])

#下面就比较简单了,大家一眼就能看懂
in:X +1
out:array([[ 2,  3,  4,  5,  6],
            [ 7,  8,  9, 10, 11],
            [12, 13, 14, 15, 16]])

in:X - 1
out:array([[ 0,  1,  2,  3,  4],
            [ 5,  6,  7,  8,  9],
            [10, 11, 12, 13, 14]])

in:X *2
out:array([[ 2,  4,  6,  8, 10],
            [12, 14, 16, 18, 20],
            [22, 24, 26, 28, 30]])

in:X /2
out:array([[0.5, 1. , 1.5, 2. , 2.5],
            [3. , 3.5, 4. , 4.5, 5. ],
            [5.5, 6. , 6.5, 7. , 7.5]])

#取整运算
in:X//2
out:array([[0, 1, 1, 2, 2],
            [3, 3, 4, 4, 5],
            [5, 6, 6, 7, 7]], dtype=int32)

#乘方运算
in:X**2
out:array([[  1,   4,   9,  16,  25],
            [ 36,  49,  64,  81, 100],
            [121, 144, 169, 196, 225]], dtype=int32)

#取余运算
in:X%2
out:array([[1, 0, 1, 0, 1],
            [0, 1, 0, 1, 0],
            [1, 0, 1, 0, 1]], dtype=int32)

#取绝对值
in:np.abs(X)
out:array([[ 1,  2,  3,  4,  5],
            [ 6,  7,  8,  9, 10],
            [11, 12, 13, 14, 15]])
          
#取sin
in:np.sin(X)
out:array([[ 0.84147098,  0.90929743,  0.14112001, -0.7568025 , -0.95892427],
            [-0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849, -0.54402111],
            [-0.99999021, -0.53657292,  0.42016704,  0.99060736,  0.65028784]])

#取e的X次方
in:np.exp(X)
out:array([[2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01,
        1.48413159e+02],
            [4.03428793e+02, 1.09663316e+03, 2.98095799e+03, 8.10308393e+03,
        2.20264658e+04],
            [5.98741417e+04, 1.62754791e+05, 4.42413392e+05, 1.20260428e+06,
        3.26901737e+06]])

#取3的X次方
in:np.power(3,X)
out:array([[       3,        9,       27,       81,      243],
            [     729,     2187,     6561,    19683,    59049],
            [  177147,   531441,  1594323,  4782969, 14348907]], dtype=int32)
            
#取log
in:np.log(X)
out:array([[0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791],
            [1.79175947, 1.94591015, 2.07944154, 2.19722458, 2.30258509],
            [2.39789527, 2.48490665, 2.56494936, 2.63905733, 2.7080502 ]])

#取以10为底的log
in:np.log10(X)
out:array([[0.        , 0.30103   , 0.47712125, 0.60205999, 0.69897   ],
            [0.77815125, 0.84509804, 0.90308999, 0.95424251, 1.        ],
            [1.04139269, 1.07918125, 1.11394335, 1.14612804, 1.17609126]])

12.矩阵运算

#矩阵运算包括两种一定要区分
in:A = np.arange(4).reshape(2,2)
    A
out:array([[0, 1],
            [2, 3]])
in:B = np.full((2,2),10)
    B
out:array([[10, 10],
            [10, 10]])

#对应位置的元素做乘法
in:A * B
out:array([[ 0, 10],
            [20, 30]])

#线性代数中的矩阵乘法
in:A.dot(B)
out:array([[10, 10],
            [50, 50]])

#转置
in:A.T
out:array([[0, 2],
            [1, 3]])

13.向量和矩阵的运算

in:v = np.array([1,2])
   A = array([[0, 1],
              [2, 3]])
   np.vstack([v]*A.shape[0])
out:array([[1, 2],
           [1, 2]])

in:v *A 
out: array([[0, 2],
            [2, 6]])  

in:v.dot(A)
out:array([4, 7])

14.矩阵的逆

in:A
out:array([[0, 1],
           [2, 3]])

in:np.linalg.inv(A)
out:array([[-1.5,  0.5],
           [ 1. ,  0. ]])

15.聚合操作

in:L= np.random.random(100)
   sum(L)
   np.sum(L)
out:47.51077024043275
	47.51077024043275

in:np.min(L)
   L.min()
out:1.827632976070248e-06
    1.827632976070248e-06
 
in:np.max()
   L.max()
out:0.9999993748010294
    0.9999993748010294

#来试试二维矩阵
in:X = np.arange(16).reshape(4,-1)
    X
out:array([[ 0,  1,  2,  3],
            [ 4,  5,  6,  7],
            [ 8,  9, 10, 11],
            [12, 13, 14, 15]])

in:np.sum(X)
out:120

#竖向求和
in:np.sum(X,axis=0)
out:array([24, 28, 32, 36])

#横向求和
in:np.sum(X,axis=1)
out:array([ 6, 22, 38, 54])

#求平均
in:np.mean(X)
out:7.5

#求中位数
in:np.median(X)
out:7.5

#求方差
in:np.var(big_array)
out:0.08341610471085403

#求标准差
in:np.std(big_array)
out:0.28881846324439514

#来个例子
in:x = np.random.normal(0,1,size=1000000)
in:np.mean(x)
out:-0.0020024202334790993 #很接近0了
in:np.std(x)
out:1.0000840912045612 #很接近1了

16.索引

#返回最小值的索引
in:np.argmin(x)
out:610261

#返回最大值的索引
in:np.argmax(x)
out:849782

17.排序和使用索引

in:x = np.arange(16)
   x
out:array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

#打乱一下
in:np.random.shuffle(x)
    x
out:array([ 9,  3, 14, 11,  8,  6,  1, 15,  5, 13, 10,  2,  7, 12,  0,  4])

#排好顺序
in:x.sort()
out:array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

#二维怎么排序呢
in:X =np.random.randint(10,size=(4,4))
    X
out:array([[4, 0, 9, 7],
            [9, 2, 9, 8],
            [7, 5, 6, 8],
            [3, 3, 1, 7]])

#按行排序
in:np.sort(X,axis = 1)
out:array([[0, 4, 7, 9],
            [2, 8, 9, 9],
            [5, 6, 7, 8],
            [1, 3, 3, 7]])

#按列排序
in:np.sort(X,axis = 0)
out:array([[3, 0, 1, 7],
            [4, 2, 6, 7],
            [7, 3, 9, 8],
            [9, 5, 9, 8]])

#argsort返回排序好的数值所在的索引
in:x = array([12,  1, 14,  7,  9, 11,  3, 15, 10,  5,  8,  0, 13,  2,  4,  6])
	np.argsort(x)
out:array([11,  1, 13,  6, 14,  9, 15,  3, 10,  4,  8,  5,  0, 12,  2,  7],
      dtype=int64)


#二维数组也可以用argsort(按行排序)
in:X = array([[4, 0, 9, 7],
               [9, 2, 9, 8],
               [7, 5, 6, 8],
               [3, 3, 1, 7]])
    np.argsort(X,axis=1)
out:array([[1, 0, 3, 2],
            [1, 3, 0, 2],
            [1, 2, 0, 3],
            [2, 0, 1, 3]], dtype=int64)

18.Fancy Indexing

#这个很重要!
in:x = array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
	x[3:9:2]
out:array([3, 5, 7])

in:ind = [3,5,8]
	x[ind]
out:array([3, 5, 8])

in:X = x.reshape(4,-1)
   X
out:array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11],
           [12, 13, 14, 15]])

in:row = np.array([0,1,2])
   col = np.array([1,2,3])
   X[row,col]
out:array([ 1,  6, 11])
#看明白了吗?取了X的第0,1,2行,以及1,2,3列对应的元素

in:col = [True,False,True,True]
    X[1:3,col]
out:array([[ 4,  6,  7],
            [ 8, 10, 11]])
#取X的1,2行,列呢?True就要False就不要

19.numpy.array的比较

in:x=array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
   x<3
out:array([ True,  True,  True, False, False, False, False, False, False,
       False, False, False, False, False, False, False])

in:x ==3
out:array([False, False, False,  True, False, False, False, False, False,
       False, False, False, False, False, False, False])

in:X = array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11],
              [12, 13, 14, 15]])
   X<6
out:array([[ True,  True,  True,  True],
           [ True,  True, False, False],
           [False, False, False, False],
           [False, False, False, False]])

in:x = array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
   np.sum(x<=3)
out:4

in:np.count_nonzero(x <=3)
out:4

#只要有0就返回true
in:np.any(x == 0)
out:True

#全为0返回true
in:np.all(x>0)
out:False

#x大于3且小于6
in:x = array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
    np.sum( (x>3) & (x<6))
out:2

#x大于3或者x小于6 
in:np.sum( (x>3) | (x<6))
out:16

#x不等于0
in:np.sum(~(x==0))
out:15

#fancy indexing的使用
in:x[x<5]
out:array([0, 1, 2, 3, 4])

in:x[x%2==0]
out:array([ 0,  2,  4,  6,  8, 10, 12, 14])

in:X = array([[ 0,  1,  2,  3],
               [ 4,  5,  6,  7],
               [ 8,  9, 10, 11],
               [12, 13, 14, 15]])
    X[X[:,3]%3==0,:]
out:array([[ 0,  1,  2,  3],
            [12, 13, 14, 15]])

numpy的使用远不止这些,但差不多囊括了我们经常使用的,不一定要全部记下来,可以边用边查就可以让我们记得很深刻了~

猜你喜欢

转载自blog.csdn.net/apologize_i/article/details/88661949