High numbers in machine learning

Derivative and gradient descent

To put it simply, the derivative is the slope of the curve, which reflects the speed of the curve. The second derivative is a reflection of how fast the slope of the curve changes.
We know that if the derivative of the function z=f(x,y) at the point P(x,y) exists, then the partial derivative of the function in any direction L at this point exists, and there is:

∂(f)∂(l)=∂(f)∂(x)cosφ+∂(f)∂(l)sinφ
∂(f)∂(l)=∂(f)∂(x)cosφ+∂(f)∂(l)sinφ

Where φφ is the rotation angle from the X axis to the direction L.
The above formula can be expressed as a matrix:

∂(f)∂(l)=(∂(f)∂(x),∂(f)∂(l))⋅(cosφ,sinφ)T
∂(f)∂(l)=(∂(f)∂(x),∂(f)∂(l))⋅(cosφ,sinφ)T

When is the maximum point product of the two vectors? Because: when the a⋅b=|a||b|cosφa⋅b=|a||b|cosφ
answer is the same direction, the dot product is the largest, so a classic machine learning algorithm-gradient descent, is like walking from the top of the mountain to the foot of the mountain, descending at the fastest speed, using the partial derivative of the current position, along Decrease in the direction of the partial derivative, and reach the destination as quickly as possible.
(∂(f)∂(x),∂(f)∂(l))(∂(f)∂(x),∂(f)∂(l))Is z=f(x,y)the gradient of the function at point P, denoted as gradf(x,y).
The direction of the gradient is the direction in which the function changes fastest at the current point.

The secret behind the number of combinations

Let’s look at a typical classical probability problem first: Pack 12 genuine products and 3 defective products in 3 boxes at random, with 5 pieces in each box. What is the probability that there is exactly one defective product in each box?
First put 15 products into 3 boxes, a total of 15!/(5!5!5!)
3 defective products into 3 boxes, a total of: 3! types of packaging. Then put 12 genuine products into 3 boxes, each with 4 pieces, the total installation method: 12!/(4!4!4!)
So the probability P(A)=(3!*12!/(4!4!) 4!))/(15!/(5!5!5!))

A general problem: N items are divided into k groups, so that the number of items in each group is n1, n2,..., nk (N=n1+n2+...+nk), the different grouping methods are: N! n1!n2!...nk!N!n1!n2!...nk!
When N tends to infinity, we come to find a special value:
H=1NlnN!n1!n2!...nk!
H=1NlnN!n1!n2! …Nk!

As N tends to infinity, the
above calculation of lnN!—>N(LnN-1) is equivalent to:
lnN−1−1N∑i=1kni(lnni−1)=−1N(∑i=1kni(lnni)− NlnN)=−1N∑i=1k(ni(lnni)−nilnN)=−1N∑i=1k(nilnniN)=−∑i=1k(niNlnniN)
lnN−1−1N∑i=1kni(lnni−1) =−1N(∑i=1kni(lnni)−NlnN)=−1N∑i=1k(ni(lnni)−nilnN)=−1N∑i=1k(nilnniN)=−∑i=1k(niNlnniN)

There are a total of N boxes, niNniN is equivalent to the frequency of the i-th box, that is p, the above H is finally converted to:
H=−∑i=1k(pi)ln(pi)
H=−∑i=1k(pi)ln( pi)

We know that there is nothing in this formula, and entropy is derived from it.

Guess you like

Origin blog.csdn.net/qq_38851184/article/details/106506182