pytorch case code-1

Learned from Master Liu's course


The simplest linear regression
Please add a picture description

Here MSELoss is the mean square error, which is used to reflect the degree of difference between the estimator and the estimated quantity.

Define the first parameter model.parameters() passed in the optimizer to tell the optimizer which tensors need to be optimized by gradient descent. This parameters will check all members in the model. If these members have corresponding weights, They will be added to the training results. There is only one linear w and b in the model here.
Please add a picture description



The simplest logistic regression (softmax) (binary classification)
improt torch.nn.functional as F
Please add a picture description
where BCELoss is a binary cross-entropy loss, suitable for 0/1 binary classification.



Multidimensional data multilayer neuron network logistic regression (binary classification)
Please add a picture description
gz is the compressed file format in linux, loadtxt can read csv or csv.gz.
Here the read data type is written as np.float32 instead of double because most GPUs only support 32-bit floating point numbers.
The column slice of reading y data should be written as [-1] instead of -1 because we want to take out a matrix instead of a vector. Torch's operations support broadcasting, and matrix operations are much faster than scalar operations.
Please add a picture description
The definition of sigmoid in the initialization function is different from the previous torch.nn.functional.sigmoid, which is inherited from the module.
The x variable is always used as the default in the definition of forward.
Please add a picture description
Please add a picture description



Load the data set
In mini-batch, batch-size is literally, and Iteration is equal to the total number of samples divided by batch-size.

Two libraries are needed. The first one is Dataset, which is used to construct datasets. Dataset supports index access (subscript). It is an abstract class, that is, it cannot be instantiated. We need to define a class that inherits it to instantiate Convert an object to use. When defining the init() of this class, there are two ways to define the size of the data set, because it is impossible to read into the memory at one time when the data is too large. When the read data is very large, for example, the following methods can be used: just in Init defines a list to store the file name of each data in the data set. If the label is too large, you can also use this method, and then you can wait until getitem to read and write the i-th file according to the file name of the list and then read it in. Ensure efficient use of memory.

DataLoader is used to take out mini-batch and can be instantiated.
Please add a picture description
Because of the difference between win and linux library functions, directly calling the DataLoader instantiation object with multithreading (num_workers=2) will report an error, but encapsulating it into an if or a function will not report an error.
Please add a picture description

Define the version of Dataset that reads all the data. xy.shape() returns the number of rows and columns of the two-dimensional matrix of the dataset. Here we take the number of samples as the number of rows.

Please add a picture description
Obtain the mini data data of xy from train_loader, and then assign the data to inputs and labels corresponding to x and y respectively. Here an input is a tensor of batch-size x data.
Please add a picture description

The complete code for reading all data versions written above is as follows:
Please add a picture description



datasets Built-in datasets
Please add a picture description
Please add a picture description
Here, MIST is directly used to construct data, and Dataset is not required.
transform=transforms.ToTensor() indicates how to transform the data. Here, because the data in the datasets is read by Pillow, it needs to be converted into a tensor. There will be certain operations in the process, such as image pixel value 255 Scale to 0-1 or -1 to 1.
The shuffle is not set during the test here, because the model does not change during the test, there is no need to shuffle, and it can ensure that the order of prediction of each output test sample is consistent with the sequence of output sample data for easy observation.
The bottom loop omits the outer epoch loop.




It is possible to use softmax to predict the probability of a well-trained model for multi-classification problems, but when optimizing, it is necessary to calculate the logarithm of the softmax result to calculate the cross entropy, so as to calculate how far it is from the real label, so the training It is necessary to calculate the cross entropy, which reflects the degree of difference between the predicted value and the real label, and has some good properties. When using cross-entropy, be careful not to use the activation function in the last layer, and pass it directly for calculation.
Pytorch provides a multi-category cross-entropy loss function (the previous BCE is a binary classification), and integrates operations such as softmax and logarithm (NLLLoss) into an interface, namely CrossEntropyLoss.
The y passed in when calculating the cross-entropy needs to be a LongTensor, that is, a long integer tensor (torch.Tensor defaults to torch.FloatTensor is 32-bit floating-point data, and torch.LongTensor is a 64-bit integer).
Please add a picture description


Here we use the 28×28 digit image classification in MNIST as an example. There are some commonly used libraries.
Please add a picture description

The read-in image should be converted into Tensor to convert 28×28×1 to 1×28×28, and then normalize the pixel values ​​according to the mean and standard deviation of MNIST, because the data of 0-1 is effective for neural network training better.
Please add a picture description

x=x.view(-1, 784) in the network layer, -1 represents the number of automatically recognized samples (where is -1, indicating how much this dimension should be automatically recognized by the function, as long as the product of each dimension remains unchanged That's fine, because the view is the reshape function of the tensor), 784 means that the 28×28 image is spliced ​​into a 784 vector, and the obtained x is a matrix tensor of N×784.
Note here that the last layer directly returns l5(x) without activation, because the cross entropy needs to be passed in for softmax.
Please add a picture description
Because the network model is a bit big, use a better optimization algorithm, here is the SGD+momentum version.
Please add a picture description

In order to better separate training and testing, here is a demonstration of encapsulating a round of training into a function.
Please add a picture description

Below is a test function. The test does not require backpropagation, and the gradient will not be calculated if the code below torch.no_grad is written.
After the data is passed into the model for prediction, the result matrix obtained after the prediction has ten quantities in each row representing each predicted value. We need to get the maximum subscript of each row corresponding to its classification, which is used here The torch.max function, here you can specify along dim1, that is, dimension 1, that is, the subscript to find the maximum value in each row (dimension 0 is along each column), but it returns two values, the first is the maximum value of each row What is the value, we don't need to pass it to the variable _, and the subscript is passed to predicted.
Finally, calculate the number of correctly predicted samples and the number of samples to calculate the correct rate.
Please add a picture description
With the above ten rounds of training in the future, it can be written in the following form. You can also if epoch%10==9, test once in ten rounds
Please add a picture description

Please add a picture description



The convolutional neural network
batch represents the number of samples. The first parameter 1 of the first convolutional layer indicates the number of input channels. Here, the input image is a grayscale image, so it is 1. The second parameter indicates the output channel. Here, it is 10, which means that this layer has 10 1×28×28 The convolution kernel. A kernel_size of 5 means that the convolution kernel has a length and width of 5. Of course, this parameter can also be passed in a tuple to use convolution kernels with different lengths and widths, but this is generally not the case.
The third layer is the pooling layer, passing in 2 means a 2×2 convolution kernel for pooling, without specifying stride, the default stride is the size of the convolution kernel. There is no weight involved here, there is no need to define multiple, just reuse one.

Observing forward, you can first use the data to obtain n, that is, batch_size; here, do the first convolution and then pooling and then RELU, then the second convolution and then pooling and then the order of RELU and the picture on the left is a little different.
Finally, we have to do a view to expand 20×4×4 to 320, and then do a full connection to reduce it to 10 dimensions to correspond to the category, and then we can throw it into the cross-entropy to obtain the loss for training.

Change these codes to the codes in the previous multi-category version, and the convolutional network will be built without changing the others.
Please add a picture description



TIPS: How to use GPU for training
Please add a picture description
Add these statements after instantiating the model. If there are multiple graphics cards, you can also specify which graphics card to use: cuda:i, and use different graphics cards for different tasks. Then not only the model must be migrated to the GPU, but also the calculated tensors must be migrated in. Note that the model and data should be placed on the same graphics card:
Please add a picture description
add it during the test, just fine:
Please add a picture description





The basic module commonly used by GoogleNet of advanced convolution is often modified on the basis of it to complete the task. The part of the red line is the same module, called an Inception.
Please add a picture description
An Inception is shown below. The inspiration comes from not knowing what size convolution kernel is suitable for the convolutional network. Then use multiple sizes of convolution kernels in an Inception. Then the convolution kernel with good effect will be weighted during training. It will also get bigger and bigger, that is, there are a few more alternative paths for the model.
The upper and lower concatenates are spliced ​​together and divided into four paths. Therefore, the length and width must be consistent when splicing at the end. Then, the specified padding, number of convolution kernels, and pooling should be used in the convolution process of different convolution kernels. type.
Why are there so many 1×1 convolutions?
Please add a picture description
Please add a picture description
Because they need to be used to change the number of input tensor channels, and 1×1 convolution can ensure
Please add a picture description
that the amount of calculation is greatly reduced channel, width, height), we have to stitch together along the channel.
Please add a picture description

Finally, the above code is integrated as follows:
Please add a picture description
1408, 88, etc. here are based on how many elements and how many channels are output after passing through some neural networks. Sometimes we don’t need to calculate these values ​​by ourselves, in order to prevent the wrong number of calculations from causing the code Error, when defining these modules, you can remove the last two lines first, first construct a tensor input of random size like MNIST according to its input, then let the model calculate once, and then see what the output size is, these values You can ask for it, just let the machine ask for it.

Finally, when we actually wrote the code, we found that the accuracy of a certain test set reached a new high point. We can make a backup of the current network parameters and save the model, or use the method of saving parameters or saving the model to prevent the code from crashing and losing the model. .




The residual network of advanced convolution
finds the gradient of x, and the minimum is to add the gradient of f to x on the basis of gradient 1, which solves the problem of gradient disappearance.
Here is the activation after Fx+x.
In order to add Fx and x, it is necessary to ensure that the input and output lengths, widths and channels of the middle two convolutional layers are the same, that is, the number of channels, length and width of x and Fx are exactly the same.
Please add a picture description
After constructing the ResidualBlock, you can write the network:
Please add a picture description
for the above, if there are various strange network structures in the future, you can nest them and encapsulate them with classes.



TIPS: Master Liu’s study suggestions
Please add a picture description



Guess you like

Origin blog.csdn.net/weixin_43739821/article/details/127234938