Verbatim https://www.dazhuanlan.com/2019/08/26/5d62fe3eea881/
Definition:
Octave for Microsoft Windows
- Software is mainly used for numerical analysis, machine learning beginner in good use
- Download: Octave official website
- download link
- After downloading the installer continue to
next
be able to complete the installation
PS can download any version, but do not download Octave 4.0.0
, this version has a major bug
Supervised learing (supervised learning)
- Definition: a machine to learn training examples (input and expected output) after pre-labeled, to predict the function of the output of any input that may arise
-
Output functions can be divided into two categories:
- Regression analysis (Regression): output continuous values, for example: Rate
- Classification (Classification): outputs a classification label, for example: with or without
Unsupervised learing (unsupervised learning)
- Definition: not given prior labeled training examples, the data is automatically enter classifying or grouping
-
Commonly used in clustering, there are two types of applications:
- Clustering (Clustering): The data set is divided into several samples usually are disjoint subsets, for example: news classified into different classes of
- Non-clustering (Non-clustering): For example: cocktail algorithm, to find the data with the valid data from the noise, the speech recognition can be used
Linear regression algorithm (Linear Regression)
-
hθ(x) = θ₀ + θ₁x₁
: Linear regression equation -
m
: The amount of data -
x⁽ⁱ⁾
: I represents the i-pen data
The cost function (Cost Function)
- Calculate the cost function
1 |
function J = costFunctionJ(X, y, theta) |
- 在 Octave 上的代价函数函数
- 找出代价函数的最小值,来找出 θ₀、θ₁
使用梯度下降 ( Gradient descent ) 将函数 J 最小化
- 初始化 θ₀、θ₁ ( θ₀=0, θ₁=0 也可以是其他值)
- 不断改变 θ₀、θ₁ 直到找到最小值,或许是局部最小值
- 梯度下降公式,不断运算直到收敛,θ₀、θ₁ 必须同时更新
- α 后的公式其实就是导数 ( 一点上的切线斜率 )
- α 是 learning rate
- 正确的算法
- 错误的算法,没有同步更新
1 |
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) |
- 在 Octave 上的梯度下降函数
Learning Rate α
- α 是 learning rate,控制以多大幅度更新 θ₀、θ₁
- 决定 α 最好的方式是随着绝对值的导数更新,绝对值的导数越大,α 越大
- α 可以从 0.001 开始 ( 每次 3 倍 )
- α 太小 : 收敛会很缓慢
- α 太大 : 可能造成代价函数无法下降,甚至无法收敛
结合梯度下降与代价函数
- 将代价函数带入梯度下降公式
- 以所有样本带入梯度下降公式不断寻找 θ₀、θ₁,在机器学习里称作批量梯度下降 ( batch gradient descent )
多特征线性回归 ( Linear Regression with multiple variables)
-
hθ(x) = θ₀x₀ + θ₁x₁ + θ₂x₂ + ... + θₙxₙ
: 多特征线性回归算式,x₀ = 1 -
n
: 特征量
使用梯度下降解多特征线性回归
- 相较于一元线性回归,只是多出最后的 xⱼ
- 拆开后
特征缩放 ( Feature Scaling ) 与均值归一化 ( Mean Normalization )
- 目的 : 加快梯度下降,因为特征值范围相差过大会导致梯度下降缓慢
-
sᵢ
: 特征缩放,通常使用数值范围 -
μᵢ
: 均值归一化,通常使用数值的平均
1 |
function [X_norm, mu, sigma] = featureNormalize(X) |
- 在 Octave 上的特征缩放与均值归一化函数
多项式回归 ( Polynomial Regression )
- 我们可以结合多种有关的特征,产生一个新的特征,例如 : 房子长、宽结合成房子面积
- 假如线性的 ( 直线 ) 函数无法很好的符合数据,我们也可以使用二次、三次或平方根函数 ( 或其他任何的形式 )
正规方程 ( Normal Equation )
X = 各特征值
y = 各结果
- 算式 :
(XᵀX)⁻¹Xᵀy
- Octave :
pinv(X'*X)*X'*y
1 |
function [theta] = normalEqn(X, y) |
- The normal equation function on the Octave
In Octave where we usually pinv
instead inv
, because the use of pinv
even XᵀX
irreversible, or will give the value of θ
-
XᵀX
Irreversible reasons:- Independent and redundant feature values
- Excessive eigenvalue (m <= n), or delete some regularization
Gradient descent vs normal equations
-
Gradient descent
-
Advantages:
- When feature large, can operate normally
- O (kn²)
-
Disadvantages:
- You need to choose α
- It requires constant iteration
-
-
The normal equation
-
Advantages:
- Not need to select α
- Without iteration
-
Disadvantages:
- For an operation (XᵀX) ⁻¹, when the feature quantity is larger, it can take a lot of time calculating (n> 10000)
- O (n³)
-