Author: Udacity
Link: https://www.zhihu.com/question/22553761/answer/233288825
Source:
Zhihu The copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.
The author of this article: Udacity Deep Learning & Machine Learning Instructor Walker
Inscription: "Simple" and "Image" are two requirements for me to strive for. "Interesting", I think math is interesting, what do you think?
What is a neural network? A neural network is a series of simple nodes that, in simple combinations , express a complex function . Let's explain one by one
Linear node
A node is a simple function model with inputs and outputs. US
1. The simplest linear node:
The simplest linear node I can think of is of .
![](https://pic3.zhimg.com/80/v2-899336269280e676ae1b14c11b9ed67a_hd.jpg)
2. Parametric linear node:
is a special linear combination, and we can generalize
all linear combinations of , ie
. Here
are the parameters for this node. Different parameters allow nodes to represent different functions, but the structure of the nodes is the same.
![](https://pic4.zhimg.com/80/v2-553d86f27c20086fcf2556094c7bd46b_hd.jpg)
3. Multi-input linear node:
We further generalize the 2 inputs to any number of inputs. Here are the parameters for this node. Likewise, different parameters allow nodes to represent different functions, but the structure of the nodes is the same. Note that it is
not a parameter of this node, and the structure of nodes with different numbers of inputs is different.
![](https://pic2.zhimg.com/80/v2-560c13a4623fd460a94f12191aae5e3b_hd.jpg)
4. Vector representation of linear nodes:
The above formula is too verbose. We use the vector to represent input vector
and the vector to
represent parameter vector
, which is not difficult to prove
. Here the vector
is the parameter of this node, and the uniqueness of this parameter is the same as the uniqueness of the input vector.
![](https://pic1.zhimg.com/80/v2-21e89ad972c3c71f25d4303588e3538c_hd.jpg)
6. Linear nodes with constants:
Sometimes, we hope that even when the input is all , the linear node can still have an output, so a new parameter is introduced
as a bias term to increase the expressiveness of the model. Sometimes, for simplicity, we think we will write expressions
. at this time
![](https://pic2.zhimg.com/80/v2-0eeccdd81d838dca2c4f49264456f51d_hd.jpg)
7. Linear node with activation function:
For a binomial classification problem, the output of the function is simply true or false, 0 or 1. The function maps true propositions to 1 and false propositions to 0.
![](https://pic1.zhimg.com/80/v2-94f3cebbe68023e269e8f1c98de1f13c_hd.jpg)
Linear node instance
1. Express (or function) with linear nodes
Or the truth table of the function is as follows:
![](https://pic1.zhimg.com/80/v2-1b78873a1f82eb0e3979d5ab228de7d6_hd.jpg)
Defining a node , it is not difficult to verify that it
is equivalent to .
![](https://pic4.zhimg.com/80/v2-0bf9747198f02a07806ce560c73402c5_hd.jpg)
2. Express with linear nodes (and function)
The truth table of the AND function is as follows:
![](https://pic2.zhimg.com/80/v2-c23cf37eeed70fd30d1ca67ef6e4d20a_hd.jpg)
Defining a node , it is not difficult to verify that it
is equivalent to .
![](https://pic1.zhimg.com/80/v2-ed3faf1b869b8ce662105d872fa32f93_hd.jpg)
Expressive power of linear nodes
A single linear node can express all linear functions (the set of real numbers in the range of the function) and all linearly separable classifiers (the range of the function is ). Concept definitions and proofs of propositions are not covered here. Although a single linear node is powerful, it still has its limitations. For linearly inseparable functions, it can't do anything, such as the XOR function
![](https://pic3.zhimg.com/80/v2-d5ef759da579d6df25e13c15543265c9_hd.jpg)
Combination of Linear Nodes
1. Multiple linear nodes are combined in the same layer:
The above linear node input is multi-dimensional, but the output is only one-dimensional, i.e. a real number. If we want multi-dimensional output, then we can place multiple nodes side by side. Let respectively be
the parameters of the nodes, then the outputs are respectively
. The final output result is
where is a
parameter matrix of .
![](https://pic1.zhimg.com/80/v2-e41b7d4bc7c82a18b5eb6b0bf3e297ae_hd.jpg)
2. Multi-layer linear node:
In the multi-layer linear node, a linear node with an activation function in one layer, the output is used as the input of the next layer. Usually the middle layer (or hidden layer, the blue node in the figure) will have an activation function to increase the expressiveness of the model. ( Thinking: If the hidden layer has no activation function, why is two layers of linear nodes equivalent to one layer? )
![](https://pic3.zhimg.com/80/v2-0e697d63e17b0ee7c685f1a24e7b41e6_hd.jpg)
Multi-layer multi-layer linear node instance
1. Express the XOR function with multiple layers
![](https://pic1.zhimg.com/80/v2-4fd7236f9ff8415caf0d9a888c05b417_hd.jpg)
This is a non-linearly separable function and cannot be expressed by a linear node. But we can use multiple layers of linear nodes to accomplish this task.
or
or
or
![](https://pic1.zhimg.com/80/v2-1efab58bbc188f557df80d5b4221f8c0_hd.jpg)
Expressive power of multi-layer linear nodes
It can be shown that multiple layers of neurons can express all continuous functions. The proof is more complicated, and those who are interested can take a look: A visual proof that neural nets can compute any function
I have basically explained the calculation method of the neural network here, but in fact, we still have many common nodes that have not been mentioned, such as Relu, Sigmoid, Dropout and so on. Neural networks not only have forward calculation, but also reverse conduction. The appearance of these nodes is closely related to reverse conduction. If there is a chance, I will write an article to answer How to explain the reverse conduction of neural networks in a simple and interesting way?