A Simple Explanation of Neural Networks

Author: Udacity

Link: https://www.zhihu.com/question/22553761/answer/233288825
Source:
Zhihu The copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

The author of this article: Udacity Deep Learning & Machine Learning Instructor Walker

 

Inscription: "Simple" and "Image" are two requirements for me to strive for. "Interesting", I think math is interesting, what do you think?


What is a neural network? A neural network is a series of simple nodes that, in simple combinations , express a complex function . Let's explain one by one

Linear node

A node is a simple function model with inputs and outputs. US

1. The simplest linear node:x+y

The simplest linear node I can think of is x+yof .

 

 

2. Parametric linear node:ax + by

x+yis a special linear combination, and we can generalize x,yall linear combinations of , ie ax+by. Here  a,b  are the parameters for this node. Different parameters allow nodes to represent different functions, but the structure of the nodes is the same.

 

 

3. Multi-input linear node:a_1x_1 + a_2x_2 + a_3x_3 + ... + a_nx_n

We further generalize the 2 inputs to any number of inputs. Here a_1, a_2, a_3, ... a_nare the parameters for this node. Likewise, different parameters allow nodes to represent different functions, but the structure of the nodes is the same. Note that it is nnot a parameter of this node, and the structure of nodes with different numbers of inputs is different.

4. Vector representation of linear nodes:a^Tx

The above formula is too verbose. We use the vector to xrepresent input vector (x_1, x_2, ..., x_n)and the vector to arepresent parameter vector (a_1, a_2, ..., a_n), which is not difficult to prove a^Tx = a_1x_1 + a_2x_2 + a_3x_3 + ... + a_nx_n. Here the vector ais the parameter of this node, and the uniqueness of this parameter is the same as the uniqueness of the input vector.

 

6. Linear nodes with constants:a^Tx + b

Sometimes, we hope that even when the input is all , 0the linear node can still have an output, so a new parameter is introduced bas a bias term to increase the expressiveness of the model. Sometimes, for simplicity, we think we will write expressions a^Tx. at this timex=(x_1, x_2,..., x_n, 1), a = (a_1, a_2, ..., a_n, b)

 

7. Linear node with activation function:\mathbb{1}(a^Tx + b > 0)

For a binomial classification problem, the output of the function is simply true or false, 0 or 1. \mathbb{1}: R\rightarrow\{1, 0\}The function maps true propositions to 1 and false propositions to 0.

 

 

Linear node instance

x \vee y1. Express (or function) with linear nodes

Or the truth table of the function is as follows:

Defining a node \mathbb{1}(x+y - 0.5>0), it is not difficult to verify that it x \vee yis equivalent to .

 

2. Express with linear nodes x \wedge y(and function)

The truth table of the AND function is as follows:

Defining a node \mathbb{1}(x+y-1.5>0), it is not difficult to verify that it x \wedge yis equivalent to .

 

 

Expressive power of linear nodes

A single linear node can express all linear functions (the set of real numbers in the range of the function) and all linearly separable classifiers (the range of the function is \{0,1\}). Concept definitions and proofs of propositions are not covered here. Although a single linear node is powerful, it still has its limitations. For linearly inseparable functions, it can't do anything, such as the XOR functionx \ oplus y

 

 

Combination of Linear Nodes

1. Multiple linear nodes are combined in the same layer:W^Tx

The above linear node input is multi-dimensional, but the output is only one-dimensional, i.e. a real number. If we want multi-dimensional output, then we can place multiple nodes side by side. Let a_1, a_2, ... , a_mrespectively be mthe parameters of the nodes, then the outputs are respectively a_1^Tx, a_2^Tx, ... , a_m^Tx. The final output result is

\begin{bmatrix} a_1^Tx \ a_2^Tx \ ... \ a_m^Tx \end{bmatrix} = \begin{bmatrix} a_1^T \ a_2^T \ ... \ a_m^T \end{bmatrix} x = \begin{bmatrix} a_1 & a_2 & ... & a_m \ \end{bmatrix}^T x = W^Tx

where W = [a_1, a_2, ... , a_m]is a n*mparameter matrix of .

 

2. Multi-layer linear node:

In the multi-layer linear node, a linear node with an activation function in one layer, the output is used as the input of the next layer. Usually the middle layer (or hidden layer, the blue node in the figure) will have an activation function to increase the expressiveness of the model. ( Thinking: If the hidden layer has no activation function, why is two layers of linear nodes equivalent to one layer? )

 

Multi-layer multi-layer linear node instance

1. Express the XOR function with multiple layersx \ oplus y

This is a non-linearly separable function and cannot be expressed by a linear node. But we can use multiple layers of linear nodes to accomplish this task.

h_1 = \mathbb{1}(x+y-0.5 >0)orh_1 = x \vee y

h_2 = \mathbb{1}(-x-y+1.5 >0)orh_2 = \overline{x \wedge y}

o = \mathbb{1}(h_1 + h_2 - 1.5 >0)oro = h_1 \wedge h_2

 

 

Expressive power of multi-layer linear nodes

It can be shown that multiple layers of neurons can express all continuous functions. The proof is more complicated, and those who are interested can take a look: A visual proof that neural nets can compute any function

I have basically explained the calculation method of the neural network here, but in fact, we still have many common nodes that have not been mentioned, such as Relu, Sigmoid, Dropout and so on. Neural networks not only have forward calculation, but also reverse conduction. The appearance of these nodes is closely related to reverse conduction. If there is a chance, I will write an article to answer How to explain the reverse conduction of neural networks in a simple and interesting way?

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325120881&siteId=291194637