代码标注及运行、调试结果

tips:深度学习中的很多错误软件来自矩阵/向量的维度不匹配，要注意检查

1.准备工作

import numpy as np '''python用于科学计算的基础包'''

import matplotlib.pyplot as plt '''python中绘制图形的库'''

import h5py '''与存储在H5文件中的数据集交互的常见包'''

import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset

%matplotlib inline

###加载设置好的数据集###

train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
index = 25

plt.imshow(train_set_x_orig[index])
print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") + "' picture.")


###train_set_x_orig的数组形式：shape (m_train, num_px, num_px, 3)

#例如可以通过访问：train_set_x_orig.shape[0] 访问到m_train（训练数量）
###应用###
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]


print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Height/Width of each image: num_px = " + str(num_px))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_set_x shape: " + str(train_set_x_orig.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x_orig.shape))
print ("test_set_y shape: " + str(test_set_y.shape))

#转化训练和测试用例
###想要将一个形如(a,b,c,d)的矩阵转化为 (b ∗∗ c ∗∗ d, a) 的矩阵，使用X_flatten = X.reshape(X.shape[0], -1).T 其中X.T 是X的矩阵的转置###


###应用###
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T


print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0])) #？？？？？？？

#要表示彩色图像，必须为每个像素指定红色，绿色和蓝色通道（RGB），因此像素值实际上是包含三个数字的向量，范围从0到255。
#机器学习中一个常见的预处理步骤是对数据集进行居中和标准化，这意味着您从每个示例中减去整个numpy数组的平均值，
#然后将每个示例除以整个numpy数组的标准偏差。 但是对于图片数据集，它更简单，更方便，几乎可以将数据集的每一行除以255（像素通道的最大值）。
#将我们的数据集进行标准化。

train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

print ("train_set_x shape: " + str(train_set_x.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
###预处理新数据集的常用步骤如下：
###弄清楚问题的大小和形状（m_train，m_test，num_px，...）
###重塑数据集，使每个示例是一个大小为（num_px * num_px * 3,1）的向量的“标准化”数据

结果：

2.数组访问技巧

train_set_x_orig的数组形式：shape (m_train, num_px, num_px, 3)

#例如可以通过访问：train_set_x_orig.shape[0] 访问到m_train（训练数量）

3.学习算法的一般体系结构

设计一种简单的算法来区分猫图像和非猫图像。

您将使用神经网络思维模式构建Logistic回归。下图解释了为什么Logistic回归实际上是一个非常简单的神经网络！

数学表达式：

针对样例

cost函数：

接下来完成以下步骤：

- 初始化模型的参数
- 通过最小化成本来了解模型的参数
- 使用学习的参数进行预测（在测试集上）
- 分析结果并得出结论

4.开始构建算法的各个部分

构建神经网络的主要步骤是：

定义模型结构（例如输入元素的数量）
初始化模型的参数
循环：
计算当前loss函数（前向传播）
计算当前梯度（反向传播）
更新参数（梯度下降）
经常会单独构建以上三个循环，并将它们集成到一个我们称为model（）的函数中。

4.1 帮助函数

使用“Python Basics”中的代码，实现sigmoid（），通过计算sigmoid，对其进行预测，其中建议使用np.exp（）

import numpy as np

def sigmoid(z):
    """
    计算z的sigmoid函数

    参数:
    z -- 任意大小的数组或者常量.

    返回值:
    s -- sigmoid(z)
    """

    ###应用###
    s = 1 / (1 + np.exp(-z))
    ###s = (1 + np.exp(-z))**(-1)  也可以
    
    return s
    #函数输出的测试（可以通过数组的方式一次输入多个）
print ("sigmoid([3, 0]) = " + str(sigmoid(np.array([3,0]))))

4.2 初始化参数

如果输入的是图片，则w的维度设置为 (num_px

将w初始化为0，建议使用np.zeros() ，b的值根据实际情况进行设置

import numpy as np
def initialize_with_zeros(dim):
    """
    该函数创建一个维数为（dim,1）,元素值为0的列向量，将b初始化为0
    参数:
    dim -- 我们想要设置的w向量的大小（或者是用例中的参数个数）
    
    返回值:
    w -- 初始化为 (dim, 1)的向量
    b -- 初始化标量(对应于偏差)
    """
    
    ### 应用###
    w = np.zeros((dim, 1), dtype=np.float)  #dtype指定数据类型
    b = 9
    #检测
    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b

　验证输出：　

dim = 7
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))

4.3前向和反向传播

目前参数已经进行初始化了，接下来可以通过执行前向和反向传播步骤进一步学习参数

实现propagate() 函数，计算cost函数以及他的梯度下降

提示：

前向传播：

1）获得X矩阵

2）计算

3）计算cost函数

可能用到的公式：

# 前向传播函数
import numpy as np

def sigmoid(z):


    ###应用###
    s = 1 / (1 + np.exp(-z))
    ###s = (1 + np.exp(-z))**(-1)  也可以
    
    return s
def propagate(w, b, X, Y):
    """
    参数:
    w -- 权重,大小为(num_px * num_px * 3, 1)的数组
    b -- 偏差, 是个常量
    X -- 数据大小 (num_px * num_px * 3, 样本大小)
    Y --  "label" 向量(0表示不是猫, 1表示是猫)，其维数为(1, 样本大小)

    返回值:
    cost -- 公式计算得出的值
    dw -- loss对w的导数, 因此维数与w一样
    db -- loss对b的导数, 因此维数与b一样
    
    提示:
    - 建议使用 np.log(), np.dot()
    """
    
    m = X.shape[1]
    
    #前向
    ### 应用np里面的内置函数
    A = sigmoid(np.dot(w.T,X)+b)                                  #计算激活函数
    cost =-1/m * np.sum(Y * np.log(A)+(1-Y)*np.log(1-A))                                  #计算cost函数，注意负号和A
    # 反向
    ###注意.dot的使用
    dw = 1/m*(np.dot(X,(A-Y).T))
    db = 1/m*np.sum(A-Y)
    ### END CODE HERE ###

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost

验证输出：

w, b, X, Y = np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]]), np.array([[1,0,1]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

4.4优化函数

目前已经初始化参数、计算cost函数及其梯度，现在要做的是使用梯度下降更新参数。

构造优化函数，通过最小化cost函数J，找到合适的w和b的值

对于参数θ，更新规则是θ=θ-αdθ，其中α为学习率

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    通过梯度下降算法，优化参数w和b
    参数:
    w -- 权重,大小为(num_px * num_px * 3, 1)的数组
    b -- 偏差, 是个常量
    X -- 数据大小 (num_px * num_px * 3, 样本大小)
    Y --  "label" 向量(0表示不是猫, 1表示是猫)，其维数为(1, 样本大小)
    num_iterations -- 优化循环的迭代次数
    learning_rate --梯度下降更新规则的学习率
    print_cost --每100步打印一次loss函数

    返回值:
    params -- 一个dictionary 包含权重w和偏差b
    grads --  一个dictionary 包含所期望的cost函数中的权重的导数dw和偏差的导数db
    costs --  一个list       包含优化过程中计算的所有的cost函数值，用于绘制学习曲线

    提示:
    主要包含以下两个步骤并进行迭代：
       1）使用propagate() 计算当前参数的cost函数和梯度
       2）使用梯度下降规则中的w和b更新参数
    """
    
    costs = []
    
    for i in range(num_iterations):
        
        
        
        ###调用前向传播函数### 
        grads, cost = propagate(w, b, X, Y)
        
        
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        #更新规则
        ###注意转化为矩阵的相乘的形式###
        w = w - np.dot(learning_rate, dw)
        b = b - np.dot(learning_rate, db)
       
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        
        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs

仍然使用前面设定的值对函数进行结果测试：

params, grads, costs = optimize(np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]]), np.array([[1,0,1]]), num_iterations= 100, learning_rate = 0.009, print_cost = False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))

前面的函数将输出最终学习的w和b，我们可以用w和b的值去预测数据集X的标签，应用predict（）函数，主要分为两个步骤来计算预测值

1.计算

2.将a的值转换成0（激活函数<=0.5）或1（激活函数>0.5），将预测值存储在向量Y_prediction中（也可以通过在for循环中使用if...else实现）

# GRADED FUNCTION: predict

def predict(w, b, X):
    '''
    使用学习到的logistic 回归参数（w,b）来预测标签值是0还是1
    参数：
    w -- 权重,大小为(num_px * num_px * 3, 1)的数组
    b -- 偏差, 是个常量
    X -- 数据大小 (num_px * num_px * 3, 样本大小)

    返回值：
    Y_prediction -- 包含在X中的样本的所有预测值，是一个数组或者向量
    '''
    
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    
    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    ### START CODE HERE ###
    A = sigmoid(np.dot(w.T, X) + b)
    ### END CODE HERE ###
    print(A.shape[1])
    for i in range(A.shape[1]):
        if A[0,i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1
        # Convert probabilities A[0,i] to actual predictions p[0,i]
        ### START CODE HERE ### (≈ 4 lines of code)
        ### END CODE HERE ###
    
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction

验证输出：

w = np.array([[0.1124579],[0.23106775]])
b = -0.3
X = np.array([[1.,-1.1,-3.2],[1.2,2.,0.1]])
print ("predictions = " + str(predict(w, b, X)))

5.将所有函数合并到模型中

通过以下提示，实现模型函数：

--Y_prediction_test 测试集上的预测值

--Y_prediction_train 训练集上的预测值

--optimize() 优化输出的 w,costs,grads 值

# GRADED FUNCTION: model
def sigmoid(z):

    ###应用###
    s = 1 / (1 + np.exp(-z))
    ###s = (1 + np.exp(-z))**(-1)  也可以
    
    return s
def initialize_with_zeros(dim):
    
    w = np.zeros((dim,1))
    b = 9
    
    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b
def propagate(w, b, X, Y):
   
    m = X.shape[1]
    
    #前向
    ### 应用np里面的内置函数
    A = sigmoid(np.dot(w.T,X)+b)                                  #计算激活函数
    cost =-1/m * np.sum(Y * np.log(A)+(1-Y)*np.log(1-A))                                  #计算cost函数，注意负号和A
    # 反向
    ###注意.dot的使用
    dw = 1/m*(np.dot(X,(A-Y).T))
    db = 1/m*np.sum(A-Y)
    ### END CODE HERE ###

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    
    costs = []
    
    for i in range(num_iterations):
         
        ###调用前向传播函数### 
        grads, cost = propagate(w, b, X, Y)
        
        
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        #更新规则
        ###注意转化为矩阵的相乘的形式###
        w = w - np.dot(learning_rate, dw)
        b = b - np.dot(learning_rate, db)
       
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        
        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs
def predict(w, b, X):

    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    
    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    ### START CODE HERE ###
    A = sigmoid(np.dot(w.T, X) + b)
    ### END CODE HERE ###
    print(A.shape[1])
    for i in range(A.shape[1]):
        if A[0,i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1
        # Convert probabilities A[0,i] to actual predictions p[0,i]
        ### START CODE HERE ### (≈ 4 lines of code)
        ### END CODE HERE ###
    
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
    通过调用之前实现的函数构建logistic回归模型
    
    参数:
    X_train -- 维数为 (num_px * num_px * 3, m_train) 的训练集
    Y_train -- 维数为 (1, m_train) 的训练标签
    X_test --  维数为 (num_px * num_px * 3, m_test) 的测试集
    Y_test --  维数为(1, m_test) 的测试标签
    num_iterations -- 超参数，表示优化参数的迭代次数
    learning_rate --  超参数，表示在optimize（）更新规则中使用的学习率
    print_cost --  设置为true，以每100次迭代打印cost函数的值
    
    返回值:
    d -- 一个dictionary，包含一个模型的基本信息.
    """
    
    ### START CODE HERE ###
    
    # initialize parameters with zeros (≈ 1 line of code)
    w, b = initialize_with_zeros(X_train.shape[0])

    # Gradient descent (≈ 1 line of code)
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]
    
    # Predict test/train set examples (≈ 2 lines of code)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    ### END CODE HERE ###

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

验证输出：

import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset

%matplotlib inline

train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
index = 25
###plt.imshow(test_set_x_orig[index])###
plt.imshow(train_set_x_orig[index])
print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")


###train_set_x_orig的数组形式：shape (m_train, num_px, num_px, 3) 

#例如可以通过访问：train_set_x_orig.shape[0]  访问到m_train（训练数量）
### START CODE HERE ### (≈ 3 lines of code)
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]
### END CODE HERE ###

### START CODE HERE ### (≈ 2 lines of code)
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
### END CODE HERE ###

train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

print ("train_set_x shape: " + str(train_set_x.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x.shape))
print ("test_set_y shape: " + str(test_set_y.shape))

d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

输出：

分析：训练正确率接近100%。有一个不错的完整性检查：您的模型正在运行，并且具有足够的容量来适应训练数据。测试错误率约为40%（？），对于这个简单模型是可以接受的，我们使用的是比较少的数据集而且logistic回归是一个线性分类器，下周将尝试更加准确的分类器

此外，可以看出，模型显然过度拟合了训练数据，之后将学习如何减少过拟合，例如：使用正规化，使用以下代码并改变index的值，可以看到测试集的预测值

增加迭代次数，进行测试：

d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 3000, learning_rate = 0.005, print_cost = True)    #更改 num_iterations = 3000 参数

部分结果：

绘制学习率曲线：

解释：可以看出cost函数不断下降，这表明各项参数正在被学习。你会发现你可以在训练集上训练模型，试着增加上述单元的迭代次数并返回，会发现训练集的正确率增加，但是测试集的正确率下降，称之为过拟合（overfitting）

6.附加题1

通过以下提示，实现模型函数，测试学习率α可能的值

提示：为了使得梯度下降更有效，应选择更加合适的学习率，学习率α决定了是否能快速更新参数。学习率过大，可能会“超”过最佳值，学习率过小，将需要更多的迭代来收敛（收敛）到最佳值。这就是为何选择一个“精调”的学习率的至关重要的原因

运行以下代码，输入不同的学习率，观察结果：

learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations (hundreds)')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

解释：

1）不同的学习率会得到不同的cost值，因此会有不同的预测结果

2）如果学习率过大（0.01），cost值将上下摆动，甚至会偏离（即使在这个例子中，使用0.01能最终收敛到cost的一个合适的值）

3）cost值小不代表是一个好模型，必须检查会不会有可能过拟合，过拟合经常发生在训练正确率比测试正确率大很多的情况下

4）在深度学习中，强烈推荐：

选择合适的学习率来使cost函数尽可能小

如果你的模型过拟合，选择其他技术来减少过拟合（之后继续学习）

7.附加题2

自己添加图片，测试模型如何处理：

总结：

1）对数据集进行预处理很重要

2）分别实现每个函数功能，再将其合并到一个model()函数中

3）调整学习率（这是“超参数”的一个例子）可以给算法带来很大不同，后面将看到更多的例子。

Andrew Ng 深度学习课后测试记录-01-week2-答案

代码标注及运行、调试结果

1.准备工作

结果：

2.数组访问技巧

3.学习算法的一般体系结构

4.开始构建算法的各个部分

4.1 帮助函数

4.2 初始化参数

4.3前向和反向传播

4.4优化函数

5.将所有函数合并到模型中

6.附加题1

7.附加题2

猜你喜欢