吴恩达《深度学习》课程作业总结——1.1 Python基础与numpy

相关代码及数据已全部上传到github:1.1-Python基础与numpy

1.1.1 关于iPython Notebook

  • 编写代码后,可以通过按“SHIFT+ENTER”来运行单元格,或者通过单击iPython Notebooks上部栏中的“运行单元格”(由播放符号表示)来运行代码。

1.1.2 用numpy建立一些基础的函数

1.1.2.1 sigmoid函数,np.exp()

s i g m o i d ( x ) = 1 1 + e x sigmoid(x)=\frac{1}{1+e^{-x}} 有时也被称为逻辑函数,它是一个非线性函数,不仅用在机器学习的逻辑回归中,而且也用在深度学习里面。

下面的是用math.exp()函数实现的sigmoid函数,它存在的问题是函数的输入必须是实数。在深度学习中我们往往用矩阵和向量,因此numpy就显得很有用。

#1-basic_sigmoid.py
import math

def basic_sigmoid(x):
    """
    Compute sigmoid of x.
    
    Arguments:
    x -- A scalar
    
    Return:
    s -- sigmoid(x)
    """

    s = 1/(1+math.exp(-x))

    return s

如果 x = ( x 1 , x 2 , . . . , x n ) x=(x_1, x_2, ..., x_n) 是一个行向量的话,那么 n p . e x p ( x ) np.exp(x) 就会将求指数的功能应用到 x x 的每个元素上,这样输出就会是: n p . e x p ( x ) = ( e x 1 , e x 2 , . . . , e x n ) np.exp(x)=(e^{x_1},e^{x_2},...,e^{x_n}) 。如下图所。py示:
在这里插入图片描述
那么对应的实现就是:

#2-sigmoid.py
import numpy as np 

def sigmoid(x):
	"""
    Compute the sigmoid of x
	
	Arguments:
	x -- A scalar or numpy array of any size
	
	Return:
	s -- sigmoid(x)
	"""
	
	s = 1/(1+np.exp(-x))
	
	return s
x = np.array([1,2,3])
sigmoid(x)

#output
array([ 0.73105858, 0.88079708, 0.95257413])

1.1.2.2 Sigmoid梯度

我们计算出梯度,通过反向传播来优化损失函数。求关于输入x的sigmoid函数的导数的公式如下:
s i g m o i d _ d e r i v a t i v e ( x ) = σ ( x ) = σ ( x ) ( 1 σ ( x ) ) sigmoid\_derivative(x)=\sigma'(x)=\sigma(x)(1-\sigma(x))
代码实现如下:

#3-sigmoid_derivative.py

def sigmoid_derivative(x):
    """
    Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
    You can store the output of the sigmoid function into variables and then use it to calculate the gradient.

    Arguments:
    x -- A scalar or numpy array

    Return:
    ds -- Your computed gradient.
    """

    s = 1.0 / (1 + 1 / np.exp(x))
    ds = s * (1 - s)

    return ds

1.1.2.3 改变数组的形状
例如,在计算机科学中,图像由3维数组形状表示(长度, 高度, 深度=3)。但是,当您将图像作为算法的输入读取时,将其转换形状为(长度 * 高度 * 3, 1)的向量。 换句话说,您将3维数组“展开”或重新整形为一维向量。
在这里插入图片描述
用代码实现这个功能:

#4-image2vector.py
def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)

    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """

    v = image.reshape((image.shape[0] * image.shape[1] * image.shape[2], 1))

    return v

1.1.2.4 将矩阵的每行归一化

归一化,会产生很好的性能,归一化后会让梯度下降收敛地更快。归一化的意思就是:将x改为 x x \frac{x}{||x||} ,即将x的每一行向量除以它的范数。
例如:如果
x = [ 0 3 4 2 6 4 ] x=\begin{bmatrix} 0 & 3 & 4\\ 2 & 6 & 4 \end{bmatrix}
那么,
x = n p . l i n a l . n o r m ( x , a x i s = 1 , k e e p d i m s = T r u e ) = [ 5 56 ] ||x||=np.linal.norm(x, axis=1, keepdims=True)=\begin{bmatrix} 5 \\ \sqrt{}56\end{bmatrix}
并且,
x _ n o r m a l i z e d = x x = [ 0 3 5 4 5 2 ( 56 ) 2 ( 56 ) 2 ( 56 ) ] x\_normalized=\frac{x}{||x||}=\begin{bmatrix} 0&\frac{3}{5}&\frac{4}{5}\\ \frac{2}{\sqrt(56)} & \frac{2}{\sqrt(56)} &\frac{2}{\sqrt(56)} \end{bmatrix}
代码实现如下:

#5-normalizeRows.py

def normalizeRows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).

    Argument:
    x -- A numpy matrix of shape (n, m)

    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """

    # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
    x_norm = np.linalg.norm(x, axis=1, keepdims = True)  #计算每一行的长度,得到一个列向量

    # Divide x by its norm.
    x = x / x_norm  #利用numpy的广播,用矩阵与列向量相除。

    return x

1.1.2.5 Pythond的广播特性和softmax函数

  • Python的广播特性对于不同形状的数组之间进行各种数学运算非常方便
  • 当算法需要对两个或更多类进行分类时,使用softmax作为规范化函数
  • 对于 x R 1 × n \in\R^{1\times{n}} ,
    s o f t m a x ( x ) = s o f t m a x ( [ x 1 x 2 . . . x n ] ) = [ e x 1 j e x j e x 2 j e x j . . . e x n j e x j ] softmax(x)=softmax([x_1 \quad x_2 \quad ...\quad x_n])= \begin{bmatrix} \frac{e^{x_1}}{\sum {_j}e^x{_j}} &\frac{e^{x_2}}{\sum {_j}e{^x{_j}}} &...&\frac{e^{x_n}}{\sum {_j}e{^x{_j}}} \end{bmatrix}
  • 对于矩阵 x R m × n x\in\R^{m\times{n}} ,同理,我们有:
    s o f t m a x ( x ) = s o f t m a x [ x 11 x 12 x 13 . . . x 1 n x 21 x 22 x 23 . . . x 2 n x m 1 x m 2 x m 3 . . . x m n ] softmax(x)=softmax \begin{bmatrix} x_{11} & x_{12} & x_{13} & ... & x_{1n}\\ x_{21} & x_{22} & x_{23} & ... & x_{2n}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ x_{m1} & x_{m2} & x_{m3} & ... & x_{mn} \end{bmatrix}
    = [ e x 11 j e x 1 j e x 12 j e x 1 j e x 13 j e x 1 j . . . e x 1 n j e x 1 j e x 21 j e x 2 j e x 22 j e x 2 j e x 23 j e x 2 j . . . e x 2 n j e x 2 j e x m 1 j e x m j e x m 2 j e x m j e x m 3 j e x m j . . . e x m n j e x m j ] =\begin{bmatrix} \frac{e^{x_{11}}}{\sum {_j}e^x{_1j}} & \frac{e^{x_{12}}}{\sum {_j}e^x{_1j}} & \frac{e^{x_{13}}}{\sum {_j}e^x{_1j}} & ...& \frac{e^{x_{1n}}}{\sum {_j}e^x{_1j}} \\ \frac{e^{x_{21}}}{\sum {_j}e^x{_2j}} & \frac{e^{x_{22}}}{\sum {_j}e^x{_2j}} & \frac{e^{x_{23}}}{\sum {_j}e^x{_2j}} & ...& \frac{e^{x_{2n}}}{\sum {_j}e^x{_2j}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{e^{x_{m1}}}{\sum {_j}e^x{_mj}} & \frac{e^{x_{m2}}}{\sum {_j}e^x{_mj}} & \frac{e^{x_{m3}}}{\sum {_j}e^x{_mj}} & ...& \frac{e^{x_{mn}}}{\sum {_j}e^x{_mj}} \\ \end{bmatrix}
    = ( s o f t m a x ( f i r s t   r o w   o f   x ) s o f t m a x ( s e c o n d   r o w   o f   x ) . . . s o f t m a x ( l a s t   r o w   o f   x ) ) =\begin{pmatrix} softmax(first \,row \, of \, x) \\ softmax(second \,row \, of \, x)\\ ...\\ softmax(last \,row \, of \, x)\\ \end{pmatrix}
    代码实现如下:
# 6-softmax.py

def softmax(x):
    """Calculates the softmax for each row of the input x.
    Your code should work for a row vector and also for matrices of shape (n, m).

    Argument:
    x -- A numpy matrix of shape (n,m)

    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (n,m)
    """

    # Apply exp() element-wise to x. Use np.exp(...).
    x_exp = np.exp(x) # (n,m)

    # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
    x_sum = np.sum(x_exp, axis = 1, keepdims = True) # (n,1),axis=1,按行计算和

    # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
    s = x_exp / x_sum  # (n,m) 广播的作用

    return s

1.1.3 向量化

为了让代码有较好的计算上的效率,我们使用向量化。

  • 使用未向量化的方式计算点积、外积、逐元素乘积,代码如下:
#7-un_vectorization.py
import time

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### 1.CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION  向量的点积实现###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot+= x1[i]*x2[i]
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### 2.CLASSIC OUTER PRODUCT IMPLEMENTATION  向量的外积实现###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i]*x2[j]
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### 3.CLASSIC ELEMENTWISE IMPLEMENTATION 逐元素实现,和点积的不同是,逐元素实现得到的是一个向量###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i]*x2[i]
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### 4.CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
  • 使用向量化的方式计算点积、外积、逐元素乘积,代码如下:
#8-vectorization.py
import time

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### 1.VECTORIZED DOT PRODUCT OF VECTORS 向量化的点积实现###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### 2.VECTORIZED OUTER PRODUCT 向量化的外积实现###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### 3.VECTORIZED ELEMENTWISE MULTIPLICATION 向量化的逐元素实现###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### 4.VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

np.dot()执行矩阵-矩阵或矩阵-向量乘法。 这是不同于np.multiply()和*运算符,它执行逐元素的乘法。

1.1.3.1 实现L1与L2损失函数

损失(loss)用于评估模型的性能。 损失越大,预测值 y ^ \widehat{y} 与真实值 y y 的差异越大。 在深度学习中,你使用像梯度下降这样的优化算法来训练你的模型来最小化成本(cost)。

  • L1损失定义为: L 1 ( y ^ , y ) = i = 0 m y ( i ) y ^ ( i ) L_1(\widehat{y},y)=\sum _{i=0}^{m}|y^{(i)}-{\widehat{y}}^{(i)}|
    代码实现为:
# 9-L1_loss_function.py

def L1(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)

    Returns:
    loss -- the value of the L1 loss function defined above
    """

    loss = np.sum(np.abs(y - yhat))

    return loss
  • L2损失定义为: L 2 ( y ^ , y ) = i = 0 m ( y ( i ) y ^ ( i ) ) 2 L_2(\widehat{y},y)=\sum _{i=0}^{m}(y^{(i)}-{\widehat{y}}^{(i)})^2
  • 注:若 x = [ x 1 , x 2 , . . . , x n ] x=[x_1, x_2, ..., x_n] ,那么np.dot(x, x)表示 j = 0 n x j 2 \sum_{j=0}^n{x{_j}{^2}}
    代码实现为:
# 10-L2_loss_function.py
def L2(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)

    Returns:
    loss -- the value of the L2 loss function defined above
    """
    
    loss = np.dot(y-yhat, y-yhat)

    return loss

猜你喜欢

转载自blog.csdn.net/littlezhan_/article/details/82793865