One article to understand what is contrastive learning (Contrastive Learning)

This article is a summary of my own study and comparison study. If you have any questions, you are welcome to criticize and correct.

foreword

Some papers refer to comparative learning as self-supervised learning (Self-supervised learning), and some refer to it as unsupervised learning (Unsupervised Learning, UL). Self-supervised learning is a form of unsupervised learning.

Self-supervised learning (Self-supervised learning) can avoid a large number of label annotations on the data set. Use the pseudo-label defined by yourself as a training signal, and then use the learned representation (representation) as a downstream task.

Purpose : To learn an encoder that encodes similar data of the same type and makes the encoding results of different types of data as different as possible (introducing more external information through proxy tasks to obtain more general (general) characterization).

comparative study

First of all, it is necessary to distinguish the difference between supervised learning and unsupervised learning. The picture below shows two cats and a dog. The training data of supervised learning is labeled. Its purpose is to determine whether the picture below is a cat or a dog. The training data of unsupervised learning has no labels. It only needs to judge that the first and second images are of the same type, and the third image is of the same type! ! ! ! ! (As for what the picture describes, it doesn't matter for unsupervised learning!)

Input these three graphs into a neural network to get the corresponding feature vectors f1, f2, f3. In the feature space, f1 and f2 are close, and f1 and f3 are distant (similar categories are close, and different categories are distant). If you don't understand, you can look at Embedding first . Why do you say this? Taking the picture of the cat on the far left as an example, we simply transform it (translation, rotation, etc.) The eigenvectors corresponding to Zhang Xintu f{1}'are f{1}''close in the feature space, or the similarity is very high. Generate new images, convert them into Embedding vectors through the encoder, and train to make f{1}', f{1}''approach, f{1}'and f_{2} distance . This process is that contrastive learning is unsupervised! ! , it can be seen that this encoder is very critical, it did not encode it into a dog because of the change of the picture. So what's the use? This can be used as a pretext. When the unlabeled data in the data we have is much larger than the labeled data, we can use the unlabeled data to compare and learn an initial encoder. This encoder has mastered By understanding some of the characteristics of the data, the clustering function can be realized, and then fine-tuned with labeled data .

           

The typical paradigm of contrastive learning is: agent task + objective function . The picture below shows a general framework for contrastive learning proposed by Google in 2020 (Paper: A Simple Framework for Contrastive Learning of Visual Representations).

Figure 1: Google’s general framework for contrastive learning

  • Data augmentation (agent task). For the same sample x, after data enhancement \widetilde{x_{i}}and \widetilde{x_{j}}two samples, simCLR belongs to the paper in the field of computer vision, such as random cropping of pictures, random color distortion, and random Gaussian blur. \widetilde{x_{i}}and \widetilde{x_{j}}is called a positive sample pair.
  • Feature extraction encoder. f(⋅) is an encoder, and there is no restriction on what encoder to use. h_{i}and h_{j}can be understood as embedding vectors.
  • g(i)(MLP layer). SimCLR emphasizes that adding this MLP layer will be better than not adding it.
  • The stage of the objective function. The loss function in comparative learning is generally infoNCE loss (as shown in Figure 2 below)

Figure 2: nfoNCE loss

 Among them, N represents the number of samples in a batch, that is, for N samples in a batch, N pairs of positive sample pairs are obtained through data enhancement. At this time, there are 2N samples in total. What is a negative sample? The approach in SimCLR is that for a given positive sample pair, the remaining 2 (N-1) samples are all negative samples, that is, the negative samples are all generated based on the data of this batch. sim(z_{i},z{j})The paper uses cosine similarity. Negative samples only appear on the denominator. It can be seen that to minimize the loss, the similarity of positive samples must be large, and the similarity of negative samples must be small.

Guess you like

Origin blog.csdn.net/qq_42018521/article/details/128867539