Gradient descent equation derivation - again added: Simplification derivative portion


The previous one is the basis of derivation. From feedback to see, speak or understand overall. But part of the derivative, there are still a lot of doubt.
In fact, in mathematics, I also learned residue. So do my best, hope can speak again supplement understand. If falsehood, correct me hope.

Base formula

Reproduced at the desired base formula, I do not understand please go to Part View Detailed.

Assume functions

$$ y '= h_θ (x) = \ sum_ {i = 0} ^ $$ nθ_ix_i

Mean square error loss function

$$ J(θ) = \frac1{2m}\sum_{i=1}^m(h_θ(x^{(i)}) - y^{(i)})^2 $$

Gradient descent θ Solving

$$ θ_j: = θ_j - α \ frac∂ {∂θ_j} J (θ) $$ pick out the part after the steps above formula α: $$ \ begin {align} \ frac∂ {∂θ_j} J (θ) & = \ frac∂ {∂θ_j} \ frac1 {2m} \ sum_ {i = 1} ^ m (h_θ (x ^ {(i)}) - y ^ {(i)}) ^ 2 \\ & = \ frac1m \ sum_ {i = 1} ^ m (h_θ (x ^ {(i)}) - y ^ {(i)}) x_j ^ i \ end {align} $$

Ah, the general is a problem here, a lot of people try to simplify, not simplify the above results.

导数 official

Some knowledge of the above simple formula requires calculus derivative, I transcribed this part used to facilitate comparison view:

Derivative

Objective derivative is obtained in the tangential direction to the point, to ensure that the next step would be the gradient descent convergence direction (i.e. the direction of minimizing the loss function above) a step iteration α. Many have talked about this tutorial, I will not nonsense.

(Lazy Photo from the Internet search, delete invasion. W actual figure is our formula θ, J (W) is what we talked about J (θ))


首先公式(\frac∂{∂θ_j})就是求导数的意思,别当做普通的分式,直接分子、分母把∂化简掉成为(\frac1{θ_j})。当然大多数人不会这样做了,我只是见过这样的情况,说出来以防万一。

事实上,你把(\frac∂{∂θ_j})换成常用的函数描述(f(θ_j))可能更贴切。

对函数的和求导法则

为了描述起来方便,我们下面使用'符号来代表求导:

\[ (u + v)' = u' + v' \]
在上面的公式中推广一下,Sigma求和不影响求导的传导,直接把Sigma符号提到前面就好:
\[ (\sum_{i=1}^mu^{(i)})' = \sum_{i=1}^m(u^{(i)})' \]

对函数的积求导法则

$$ (uv)' = u'v+uv' $$

幂函数求导法则

$$ (x^u)' = ux^{(u-1)} $$

对常数求导

这是我最爱的部分:


\[ (C)' = 0 \]

链式法则

这是我最不喜欢的部分:
假设我们希望对变量z求导,而变量z依赖变量y,变量y又依赖变量x。例如:


\[ z = f(y) \\ y = g(x) \]
也即:
\[ z = f(g(x)) \]
那么对z求导就构成了链式法则:
\[ (z)' = (f(g(x)))'·(g(x))' \]
注意最后面乘上内部依赖函数求导的过程,简直是反人类的天外来客,经常会忘。但我等遵循自然界规则的凡人又能如何,死记而已。

推导

基本公式列完,开始推导过程:


\ [\ Frac∂ {∂θ_j} J (θ) = \ frac∂ {∂θ_j} \ frac1 {2m} \ sum_ {i = 1} ^ m (h_θ (x ^ {(i)}) - y ^ { (i)}) ^ 2 \
] derivation method according to the above said summation function:
\ [= \ frac1 2M} {\ sum_. 1} = {I ^ m (\ frac∂ ∂θ_j} {(h_θ (X ^ {(i)}) - y
^ {(i)}) ^ 2) \] do not rush for a power derivation, considering the dependence on the middle of the loss function, the actual first processing chain rule:
\ [= \ frac1 {2m} \ sum_ {i = 1} ^ m (\ frac∂ {∂θ_j} (h_θ (x ^ {(i)}) - y ^ {(i)}) ^ 2) · (\ frac∂ {∂ θ_j} (h_θ (x ^ {
(i)}) - y ^ {(i)}) \] now equations previously moiety may power derivation, in part to assume a function of the latter to expand:
\ [= \ frac1 { 2m} \ sum_ {i = 1 } ^ m2 · (h_θ (x ^ {(i)}) - y ^ {(i)})) · (\ frac∂ {∂θ_i} (\ sum_ {i = 0} ^ nθ_ix_i - y ^ {(i
)})) \] because the expanded assumptions used in the function i represents the i th weight, the foregoing derivation also replaced (θ_i), does not refer to the i-th sample batches data. Here is the original did not intend to start speaking, it's a bit easier to use symbolic names mixed, but the concept is clear, then it should be no trouble misunderstanding.


Continues the first half of the equation with 2 1/2 would be offset, which is the prequel to do all the time variance multiplied by 1/2 purposes; behind the Sigma derivation continue to use the sum function derivation law expansion:


\[ = \frac1{m}\sum_{i=1}^m(h_θ(x^{(i)}) - y^{(i)}))·(\sum_{i=0}^n\frac∂{∂θ_i}θ_ix_i - \frac∂{∂θ_i}y^{(i)}) \]
前半部分的化简已经完成,简单起见,我们只把后面部分摘出来:
\[ \sum_{i=0}^n\frac∂{∂θ_i}θ_ix_i - \frac∂{∂θ_i}y^{(i)}\\ = \frac∂{∂θ_i}(θ_0x_0+θ_1x_1+...+θ_ix_i+...+θ_nx_n) - \frac∂{∂θ_i}y^{(i)} \]
根据求和函数求导法则展开,等于对其中每一项求导。而我们在对(θ_i)进行求导的时候,其余各项对我们来说,实际上就是一个常数,它们在求导这一刻是固定不能变的。嗯嗯,记得上一篇最后的提醒吗?θ在每个循环内固定不变,在计算完所有的θ之后,才一次代入,并在下个循环内保持不变。



The derivation of the constant, just said, it was my favorite, because the result is 0. We also copied several rows (y ^ {(i)} ) derivative, good hard I endure, because it is a constant given sample set, the result is 0:
\ [= 0 + 0 + .. . + \ frac∂ {∂θ_i} θ_ix_i
+ ... + 0 - 0 \] now needs of the launched product derivation function:
\ [= \ frac∂ ∂θ_i} {+ θ_i θ_i · · x_i \ {frac∂ ∂θ_i} x_i \]
you see, this world is not always so brutal, back (x_i) and double Cheese Li is a constant, so after derivation multiplied by (θ_i) is still 0.
In front of the derivation result (θ_i) it is 1, the reason is very simple, you can put (θ_i) seen as a power.
\ [\ Begin {align} & = \ frac∂ {∂θ_i} (θ_i) ^ 1 · x_i + 0 \\ & = 1 · θ_i ^ {(1-1)} · x_i \\ & = 1 · 1 · x_i \\ & = x_i \\ \ end
{align} \] just a moment, the world is clean up. Suppose the final result of the original derivation function, but (θ_i) coefficients (x_i).


Earlier we pick two out of the local equation simplification, it is time to put them back up:


\[ \begin{align} θ_j & = θ_j - α\frac∂{∂θ_j}J(θ) \\ & = α\frac∂{∂θ_j}\frac1{2m}\sum_{i=1}^m(h_θ(x^{(i)}) - y^{(i)})^2 \\ & = α\frac1m\sum_{i=1}^m(h_θ(x^{(i)}) - y^{(i)})x_j^i \end{align} \]


I do not want to add supplements to write complement the bar.

Guess you like

Origin www.cnblogs.com/andrewwang/p/11075395.html