Outline of Machine Learning created by Andrew Ng on Coursera

版权声明:其他网站转载请注明原始链接,尽量不要破坏格式 https://blog.csdn.net/landstream/article/details/75119688

By the time you finish this class
* You’ll know how to apply the most advanced machine learning algorithms to such problems as anti-spam, image recognition, clustering, building recommender systems, and many other problems.
* You’ll also know how to select the right algorithm for the right job, as well as become expert at ‘debugging’ and figuring out how to improve a learning algorithm’s performance.

Week 1

Introduction

  • What is Machine Learning(机器学习是什么
  • 机器学习分类:
    1. Supervised Learning监督学习:已经知晓样本应该归属的类别
    2. Unsupervised Learning 非监督学习:不知道样本所属的类别

Model and Cost Function

(数学模型和代价函数)
1. 模型表现
2. 代价函数

Parameter Learning

  • Gradient Descent(梯度下降法)
  • Linear Regression with One Variable
  • Gradient Descent for Linear Regression

Linear Algebra Review

Week 2

Linear Regression with Multiple Variables

  1. Multiple Features
  2. Gradient Descent for Multiple Variables
  3. Gradient Descent in Practice
    • Feature Scaling
    • Learning Rate
  4. Features and Polynomial Regression

Computing Prameters Analytically

  1. Normal Equation
  2. Noninvertibility

Submitting Programming Assignments

Octave/Matlab Tutorial

->Summary&Note

机器学习的目的是通过已有的样本数据,对样本外的个体进行自动预测或者分类。机器学习的种类可以分为监督学习和非监督学习,机器学习的具体工作内容有定量性质的回归,以及针对类别的分类。

说到回归,就是要构造出函数尽量去拟合所有样本数据,从而能对样本外的结果进行预测,这里用最小二乘法来计算函数同实际样本数据的误差,实际方式有两种,分别是:Gradient Descent 和 Normal Equation,前者从高数中的微积分角度来理解,后者可以从线性代数中的向量投影来理解。

Gradient Descent 和 Normal Equation 的比较

Gradient Descent

    \theta_{j}:=\theta_{j}-\alpha\frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)}-y^{(i)}))*x_j^{(i)}

    vectorization:

    \theta := \theta-\frac{\alpha}{m}X^{T}(h_{\theta}(X)-Y)

Gradient Descent 在实际操作中,计算代价函数并修改theta参数使得模型在迭代中不断接近理想。为了提高拟合速度,可以增大学习率alpha,也可以将样本中的数据进行缩放,常见的有归一化。学习率的选择有讲究,太小会拖延,太大则不容易得到理想Feature数据,甚至于使得函数无法收敛,算法的时间复杂度为 O(Kn^2)。

Mean Normalization:

    x_{i}:=\frac{X_{i}-\mu_{i}}{s_{i}}

Normal Equation

    \theta=(X^{T}X)^{-1}X^{T}y

则是通过矩阵计算来直接获得理想的 features ,简单直接,算法的时间复杂度为O(n^3)。

样本量较小的时候选Normal Equation没跑,一般在样本量 >10000 的时候会毫不犹豫选择Cost Function。

附注:关于 Normal Equation 的推导,David C. Lay 所著的 Linear Algebra and its Applications 中有很优秀的讲解。

Week 3

Logistic Regression

    0<=h_\theta<=1
  • Hypothesis Representation:
    h_{\theta}(x)=\frac{1}{1+e^{-\theta^{T}X}}:Logistic function

    g(z)=\frac{1}{1+e^{-z}} :Sigmoid function

     h_{\theta}(x)=g(\theta^Tx)

    P(y=0|x;\theta) = 1- P(y=1|x;\theta)
  • Decision boundary
    确定 h() 函数分类阈值

  • Cost Function

    扫描二维码关注公众号,回复: 3051069 查看本文章

    注意区分 Linear regression 和 logistic regression 代价函数的不同

    J(\theta)=\frac{1}{m}\sum_{i=1}^mCost(h_\theta,y^{(i)})

    =\frac{1}{m}[\sum_{i=1}^my^{(i)}log(h_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))]
  • Simplified cost function and gradient descent
  • Gradient descent
    \theta_j:=\theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta)

Regularization

Muticlass Classification

Solving the Problem of Overfitting

This terminology is applied to both linear and logistic regression. There are two main options to address the issue of overfitting:

1) Reduce the number of features:

Manually select which features to keep.
Use a model selection algorithm (studied later in the course).
2) Regularization

Keep all the features, but reduce the magnitude of parameters θj.
Regularization works well when we have a lot of slightly useful features.

修改cost function,加入参数大小对函数的评估,从而限制高次项系数大小,让boundary更平滑,减小overfitting(过拟合)

Recall that if m < n, then XTX is non-invertible. However, when we add the term λ⋅L, then XTX + λ⋅L becomes invertible.

Week 4

Neural Networks Representation

一种 解决特征很多组合又十分复杂的非线性模型的Regression问题 的方法

Model representation

a_i^j : activation of unit i in layer j

\Theta^{(j)} = matrix.of.weights. controlling.function.mapping.from layer.j.to.layer.j+1

3-layer Neural network model

这里写图片描述

Notes

神经网络模型能够方便灵活的表达包括:布尔逻辑运算、复杂的多项式运算等数学模型,可以提高训练的时间效率。

猜你喜欢

转载自blog.csdn.net/landstream/article/details/75119688