AI Neural Network

Introduction to Neural Networks

  There are two kinds of neural networks, one is biological neural network and the other is artificial neural network.

  Biological neural network: generally refers to the network composed of biological brain neurons, cells, contacts, etc., which are used to generate biological consciousness and help biological thinking and actions.

  Artificial Neural Networks (ANNs for short), also referred to as neural networks (NNs) or connection models (Connection Model), is a kind of algorithmic mathematics that imitates the behavior characteristics of animal neural networks and performs distributed parallel information processing. Model. This kind of network depends on the complexity of the system, and achieves the purpose of processing information by adjusting the interconnection relationship between a large number of internal nodes.

  Artificial neural network: It is a mathematical model that uses a structure similar to the synaptic connection of the brain for information processing. In engineering and academia, it is often referred to directly as "neural network" or neural network.

       Artificial neural network is the technical reproduction of biological neural network in a simplified sense. As a discipline, its main task is to build a practical artificial neural network model according to the principle of biological neural network and the needs of practical applications, and design corresponding Learning algorithms simulate certain intelligent activities of the human brain, and then implement them technically to solve practical problems. Therefore, the biological neural network mainly studies the mechanism of intelligence; the artificial neural network mainly studies the realization of the intelligent mechanism, and the two complement each other.

research content

The research content of neural network is quite extensive, reflecting the characteristics of multidisciplinary interdisciplinary technical fields. The main research work focuses on the following aspects:

biological prototype

Study the biological prototype structure and functional mechanism of nerve cells, neural networks, and nervous systems from the aspects of physiology, psychology, anatomy, brain science, and pathology.

Modeling

Based on the study of biological prototypes, the theoretical models of neurons and neural networks are established. These include conceptual models, knowledge models, physical and chemical models, mathematical models, etc.

algorithm

Construct a specific neural network model on the basis of theoretical model research to realize computer simulation or prepare hardware, including research on network learning algorithms. Work in this area is also known as technical model research.

The algorithm used in neural networks is vector multiplication, and symbolic functions and their various approximations are widely used. Parallelism, fault tolerance, hardware implementation and self-learning characteristics are some of the basic advantages of neural networks, and also the difference between neural network computing methods and traditional methods.

Classification

According to its model structure, the artificial neural network can be roughly divided into two categories: feedforward network (also known as multi-layer perceptron network) and feedback network (also known as Hopfield network). The former can be regarded as a kind of Large-scale nonlinear mapping systems, and the latter is a class of large-scale nonlinear dynamical systems. According to the learning method, the artificial neural network can be divided into three types: supervised learning, unsupervised and semi-supervised learning; according to the working method, it can be divided into two types: deterministic and random; according to the time characteristics, it can also be divided into continuous or discrete. type two, and so on.

features

Regardless of the type of artificial neural network, their common characteristics are large-scale parallel processing, distributed storage, elastic topology, high redundancy and nonlinear operation. Therefore, it has very high computing speed, strong association ability, strong adaptability, strong fault tolerance and self-organization ability. These characteristics and capabilities constitute the technical basis for artificial neural networks to simulate intelligent activities, and have gained important applications in a wide range of fields. For example, in the field of communication, artificial neural networks can be used for data compression, image processing, vector coding, error control (error correction and error detection coding), adaptive signal processing, adaptive equalization, signal detection, pattern recognition, ATM flow control , route selection, communication network optimization and intelligent network management, etc.

Perceptron

In fact, the previous post talked about some content about the perceptron, so I will repeat it here.
First of all, this picture
is an MP neuron

A neuron has n inputs, and each input corresponds to a weight w. The neuron multiplies the input and the weight and then sums it. The result of the summation is different from the bias, and finally puts the result into the activation function. The final output is given by the activation function, and the output is often binary, with a 0 state representing inhibition and a 1 state representing activation.

The output of the perceptron (above)


Perceptrons can be divided into single-layer perceptrons and multi-layer perceptrons.
We are mainly talking about single-layer perceptrons here.
The perceptron is composed of two layers of neural networks. The input layer receives external input signals and transmits them to the output layer. The output layer is MP neurons.

 

Perceptron (above)

The perceptron can be regarded as a hyperplane decision surface in the n-dimensional instance space. For the samples on one side of the hyperplane, the perceptron outputs 1, and the instance on the other side outputs 0. The decision hyperplane equation is w⋅x= 0. Those positive and negative sample sets that can be separated by a certain hyperplane are called linearly separable (linearly separable) sample sets, and they can be represented by the perceptron in the graph.
And, or, and non-problems are all linearly separable problems, which can be easily represented by a perceptron with two inputs, and XOR is not a linearly separable problem, so it is not possible to use a single-layer perceptron. At this time It is necessary to use a multi-layer perceptron to solve the puzzle problem.

What if we want to train a perceptron?
We'll start with random weights and iteratively apply the perceptron to each training example, modifying the perceptron's weights whenever it misclassifies an example. This process is repeated until the perceptron correctly classifies all examples. Each step modifies the weight according to the perceptron training rule, that is, modifies the weight wi corresponding to the input xi, and the rule is as follows:

 

Here t is the target output of the current training example, o is the output of the perceptron, and η is a positive constant called the learning rate. The role of the learning rate is to ease the degree of weight adjustment at each step. It is usually set to a small value (such as 0.1), and sometimes it will decay as the number of weight adjustments increases.

A multi-layer perceptron, or a multi-layer neural network, is nothing more than adding multiple hidden layers between the input layer and the output layer. Subsequent CNN, DBN and other neural networks just redesign the type of each layer. . The perceptron can be said to be the basis of the neural network, and subsequent more complex neural networks are inseparable from the simplest model of the perceptron.

Four Neural Network Architectures

This section will briefly introduce the architectures of four neural networks, CNN, RNN, DBN, and GAN.

Convolutional Neural Network (CNN)

When it comes to machine learning, we often keep up with a word called pattern recognition, but pattern recognition in real environments often has various problems. For example:
image segmentation: real scenes are always mixed with other objects. It's hard to tell which parts belong to the same object. Parts of objects can be hidden behind other objects.
Object Lighting: The intensity of pixels is strongly affected by lighting.
Image Warping: Objects can be deformed in various non-affine ways. For example, handwriting can also have a large circle or just a point.
Contextual support: The categories to which objects belong are often defined by how they are used. Chairs, for example, are designed for people to sit on, so they come in a variety of physical shapes.
The difference between a convolutional neural network and an ordinary neural network is that the convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer. In a convolutional layer of a convolutional neural network, a neuron is only connected to some neurons in neighboring layers. In a convolutional layer of CNN, it usually contains several feature planes (featureMap). Each feature plane is composed of some neurons arranged in a rectangle. The neurons of the same feature plane share weights. The shared weights here are volumes. Accumulation. The convolution kernel is generally initialized in the form of a random decimal matrix, and the convolution kernel will learn to obtain reasonable weights during the training process of the network. The direct benefit of sharing weights (convolution kernels) is to reduce connections between layers of the network while reducing the risk of overfitting. Subsampling is also called pooling, and it usually has two forms: mean pooling and max pooling. Subsampling can be seen as a special kind of convolution process. Convolution and subsampling greatly simplify the model complexity and reduce the parameters of the model.
A convolutional neural network consists of three parts. The first part is the input layer. The second part consists of a combination of n convolutional and pooling layers. The third part consists of a fully connected multilayer perceptron classifier.
Here is an example of AlexNet:

picture

 

·Input: 224×224 size image, 3 channels
·First convolution layer: 96 convolution kernels with a size of 11×11, 48 on each GPU.
·The first layer of max-pooling: 2×2 kernel.
·Second layer convolution: 256 5×5 convolution kernels, 128 on each GPU.
·The second layer of max-pooling: 2×2 kernel.
The third layer of convolution: it is fully connected with the previous layer, and there are 384 3*3 convolution kernels. Divide 192 on two GPUs.
The fourth layer of convolution: 384 convolution kernels of 3×3, 192 for each of the two GPUs. The connection between this layer and the previous layer does not go through the pooling layer.
The fifth layer of convolution: 256 3×3 convolution kernels, 128 on two GPUs.
·The fifth layer max-pooling: 2×2 kernel.
The first layer of full connection: 4096 dimensions, the output of the fifth layer of max-pooling is connected into a one-dimensional vector as the input of this layer.
·The second layer is fully connected: 4096 dimensions
·Softmax layer: The output is 1000, and each dimension of the output is the probability that the picture belongs to this category.

Convolutional neural networks have important applications in the field of pattern recognition. Of course, this is just the simplest explanation of convolutional neural networks. There are still a lot of knowledge in convolutional neural networks, such as local receptive fields, weight sharing, and multiple convolution kernels. and so on,

Convolutional neural networks still use the same principles as multilayer perceptrons (MLPs), but it is worth noting that it uses convolutional layers, and convolutional neural networks are often applied to images and videos. It's important to realize that an image is just a grid of numbers, with each number representing the intensity of a certain pixel. Knowing that an image is a grid of numbers, it is possible to find patterns and features of the image by manipulating the numbers. Convolutional layers do this by using filters.

Recurrent Neural Network (Recurrent Neural Network) RNN

Traditional neural networks are difficult to deal with many problems. For example, if you want to predict the next word of a sentence, you generally need to use the previous words, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output. The specific manifestation is that the network will remember the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layers are no longer connected but connected, and the input of the hidden layer not only includes the output of the input layer Also includes the output of the hidden layer at the previous moment. In theory, RNN can process sequence data of any length.
This is a simple RNN structure, and it can be seen that the hidden layer itself can be connected to itself.

picture

 

RNN

So why the hidden layer of RNN can see the output of the hidden layer at the previous moment? In fact, it is very clear when we unfold this network.

picture

 


After the network receives the input Xt at time t, the value of the hidden layer is St, and the output value is Ot. The key point is that the value of is not only depends on Xt, but also depends on St-1.

 
Equation 1 is the calculation formula of the output layer. The output layer is a fully connected layer, that is, each node of it is connected to each node of the hidden layer. V is the weight matrix of the output layer and g is the activation function. Equation 2 is the calculation formula for the hidden layer, which is a recurrent layer. U is the weight matrix of the input x, W is the weight matrix of the last value St-1 as the input of this time, and f is the activation function.

From the above formula, we can see that the difference between the recurrent layer and the fully connected layer is that the recurrent layer has an additional weight matrix W.
If we repeatedly put formula 2 into formula 1, we will get:


It can be seen from the above that the output value of the cyclic neural network is affected by the previous input values ​​Xt, Xt-1, Xt-2, X-3, X-4..., which is why the cyclic neural network can go to Reason for looking ahead to any number of input values.

Deep Belief Network DBN

Before talking about DBN, we need to have a certain understanding of the basic unit of DBN, that is, RBM, restricted Boltzmann machine.
First of all what is a Boltzmann machine?

 

 

As shown in the figure, it is a Boltzmann machine, the gray nodes are the hidden layer, and the white nodes are the input layer.
The difference between Boltzmann machine and recurrent neural network is reflected in the following points:
1. The essence of recurrent neural network is to learn a function, so it has the concept of input and output layers, while the usefulness of Boltzmann machine is to learn a function. An "intrinsic representation" of group data, so it has no notion of an output layer.
2. The nodes of the recurrent neural network are connected as a directed ring, while the nodes of the Boltzmann machine are connected as an undirected complete graph.

 

And what is a restricted Boltzmann machine?
In the simplest terms, a restriction is added, which turns the complete graph into a bipartite graph. That is to say, it consists of an explicit layer and a hidden layer, and the neurons in the explicit layer and the hidden layer are bidirectionally fully connected.

RBM

h indicates the hidden layer, v indicates that the display layer
is in the RBM, there is a weight w between any two connected neurons to indicate its connection strength, and each neuron itself has a bias coefficient b (for the display layer neuron) and c (for hidden layer neurons) to represent its own weight.
The specific formula derivation is not shown here

DBN is a probabilistic generation model. Compared with the neural network of the traditional discriminant model, the generation model is to establish a joint distribution between observation data and labels, and evaluate both P(Observation|Label) and P(Label|Observation) , while the discriminant model only evaluates the latter, that is, P(Label|Observation).
DBN consists of multiple Restricted Boltzmann Machines (Restricted Boltzmann Machines) layers, a typical type of neural network is shown in the figure. These networks are "restricted" to one visible layer and one hidden layer, with connections between layers but no connections between units within a layer. Hidden layer units are trained to capture the higher-order data correlations manifested in the visible layer.

DBN

Generative Adversarial Network GAN


The goal of generating an adversarial network is to generate. Our traditional network structure is often a discriminative model, that is, to judge the authenticity of a sample. Generative models, on the other hand, are able to generate similar new samples based on the samples provided, noting that these samples are learned by the computer.
GAN generally consists of two networks, a generative model network and a discriminative model network.
The generative model G captures the distribution of the sample data, and generates a sample similar to the real training data with a noise z that obeys a certain distribution (uniform distribution, Gaussian distribution, etc.), and the pursuit of the effect is that the more similar to the real sample, the better; the discriminant model D is a binary The classifier estimates the probability that a sample comes from training data (rather than generated data). If the sample comes from real training data, D outputs a high probability, otherwise, D outputs a small probability.
For example: the generation network G is like a counterfeit money making gang, specializing in making counterfeit money, and the discrimination network D is like the police, specializing in detecting whether the currency used is genuine or counterfeit. The goal of D is to find ways to detect the counterfeit money generated by G.
Traditional discriminative network: (below)

 

 

Generate a confrontation network: (below)


During the training process, one party is fixed, the network weight of the other party is updated, and iterated alternately. In this process, both parties try their best to optimize their own networks, thus forming a competitive confrontation until the two parties reach a dynamic balance (Nash equilibrium). At this time The generative model G restores the distribution of the training data (creates samples exactly the same as the real data), and the discriminative model can no longer distinguish the result, with an accuracy rate of 50%.

The following shows an example of cDCGAN (written in the previous post)
to generate the network (below)

 

 

generate network

Discriminant network (below)

 

discriminant network

The final result, using MNIST as the initial sample, through the numbers generated after learning, it can be seen that the learning effect is still good.

picture

 

generate network



 

Guess you like

Origin blog.csdn.net/MYBOYER/article/details/127744565