cs231_linear classification

The k-Nearest Neighbor classifier has the following shortcomings: the
classifier must remember all training data and store it, so that the future test data can be used for comparison. This is inefficient in storage space, and the size of the data set can easily be measured in GB.
To classify a test image needs to be compared with all training images, and the algorithm consumes high computational resources.
Introduction:
We are going to implement a more powerful method to solve image classification problems, which can naturally extend to neural networks and convolutional neural networks.

This method is mainly composed of two parts:
one is the score function, which is the mapping of the original image data to the category score.
One is the loss function, which is used to quantify the consistency between the score of the predicted classification label and the true label. This method can be transformed into an optimization problem. In the optimization process, the loss function value will be minimized by updating the parameters of the scoring function. Parametric mapping from image to label score

The first part of the method is to define a scoring function, this function maps the pixel value of the image to the score of each classification category, and the score represents the probability that the image belongs to the category.

The following is a specific example to demonstrate this method.

Now suppose there is a training set xi∈RD containing many images, and each image has a corresponding classification label yi. Here i=1,2...N and yi∈1...K. yi∈1...K. That is to say, we have N image samples, each image has a dimension D, and there are K different categories.
For example, in CIFAR-10, we have a training set of N=50,000, each image has D=32x32x3=3072 pixels, and K=10, because the pictures are divided into 10 different categories ( Dogs, cats, cars, etc.).
We now define the scoring function as: f:RD→RK, which is the mapping from the original image pixels to the classification scores.

Introduction to linear classifiers:

In this model, we start with the simplest probability function, a linear mapping:
f(xi,W,b)=Wxi+b
In the above formula, it is assumed that each image data is stretched to a length of D The column vector of with size [D x 1]. The matrix W of size [K x D] and the column vector b of size [K x 1] are the parameters of the function.
Taking CIFAR-10 as an example, xi contains all the pixel information of the i-th image, which is pulled into a column vector of [3072 x 1], W is [10x3072], and b is [10x1] . Therefore, 3072 numbers (original pixel values) are input to the function, and the function outputs 10 numbers (scores from different categories). The parameter W is called weights. b is called a bias vector because it affects the output value, but it is not related to the original data xixi x_ixi. In practice, people often mix the terms weight and parameter.

Points to note:

First, a single matrix multiplication Wxi efficiently evaluates 10 different classifiers in parallel (each classifier is for a class), where the classifier of each class is a row vector of W.
Note that we believe that the input data (xi, yi) is given and unchangeable, but the parameters W and b are controllable and changeable. Our goal is to set these parameters so that the calculated classification score matches the real class label of the image data in the training set.
One advantage of this method is that the training data is used to learn the parameters W and b. Once the training is completed, the training data can be discarded, leaving the learned parameters. This is because a test image can be simply input to the function and classified based on the calculated classification score.
Finally, note that only one matrix multiplication and one matrix addition are required to classify a test data, which is much faster than the method of comparing the test image with all the training data in k-NN.

Understand linear classifiers

The linear classifier calculates the matrix multiplication of the values ​​of all pixels in the three color channels in the image and the weights to obtain the classification score.
According to the value we set for the weight, the function shows likes or dislikes for certain colors in certain positions in the image (depending on the sign of each weight).

Give an example of mapping an image to a classification score. In order to facilitate visualization, assume that the image has only 4 pixels (all black and white pixels, RGB channels are not considered here), and there are 3 categories (red for cats, green for dogs, and blue for boats. Note that red, green and blue are here. The 3 colors only represent classification, and have nothing to do with RGB channels).
Insert picture description here
First, the image pixels are stretched into a column vector, matrix multiplied by W, and then the score of each category is obtained. It should be noted that this W is not good at all: the cat classification score is very low. From the above picture, the algorithm feels that this image is a dog.

Think of the image as a high-dimensional point: (do not understand)

Since the image is stretched into a high-dimensional column vector, then we can regard the image as a point in this high-dimensional space (that is, each image is a point in 3072-dimensional space). The entire data set is a collection of points, and each point has a classification label.
Since the score of each classification category is defined as the matrix product of the weight and the image, the score of each classification category is the value of a linear function in this space. We can’t visualize linear functions in 3072-dimensional space, but if we squeeze these dimensions to two dimensions, then we can see what these classifiers are doing: Insert picture description herethe schematic diagram of the image space
where each image is a point, with 3 Classifiers. Taking the red car classifier as an example, the red line represents the set of points in the space where the car classification score is 0, and the red arrow represents the direction in which the score increases. All points to the right of the red line have positive scores and linearly increase. The point value to the left of the red line is negative and decreases linearly.
As you can see from the above, each row of W is a classifier of a classification category. The geometric interpretation of the W number is: if you change the number in one of the rows, you will see that the corresponding line of the classifier in the space starts to rotate in different directions. The deviation b allows the straight line translation corresponding to the classifier. It should be noted that if there is no bias, regardless of the weight, the classification score is always 0 when xi=0. In this way, the lines of all classifiers have to pass through the origin.
Image data preprocessing : In the above example, all images are the original pixel values ​​used (from 0 to 255). In machine learning, it is a common routine to normalize the input features. In the example of image classification, each pixel on the image can be regarded as a feature. In practice, it is very important to subtract the average value from each feature to center the data. In the example of these pictures, this step means calculating an average image value based on all the images in the training set, and then subtracting this average value from each image, so that the pixel values ​​of the image are approximately distributed in [-127, 127] In between. The next common step is to change the interval of all numerical distributions to [-1, 1]. This is the centralization of zero mean.

Reprinted from : https://blog.csdn.net/weixin_38278334/article/details/82831541

Guess you like

Origin blog.csdn.net/better_boy/article/details/106986269