Yiwen Society Machine Learning & Meta Learning

Machine Learning

The goal of Machine Learning - to find a function (generally built by a neural network, that is, a neural network represents a function). For example, in a classification problem, we input a cat picture into the function, and this function can correctly identify that the content of the picture is a cat, as shown in the upper right corner of the figure below.

Machine Learning has three steps:

Step 1: Define a function named fθ, which is built by the neural network (theta is used to represent the parameters to be learned by the neural network, namely biases and weights). There are some unknown parameters in this function, such as bias Set biases and weights Weights. As shown in the middle right part of the figure below.

 The second step: define the loss function L(θ), the loss function L(θ) is a function of θ. So how to define the loss function, please see below:

        First of all, some training samples are needed. Suppose we take the aforementioned classification problem as an example, send the training samples into the built neural network (in the objective function fθ) to get the output, and then calculate the error Corss-entropy (cross entropy, general application) with the correct answer In the classification problem) , adding up the Cross-entropy of each training sample is the loss. Since the loss is a function of θ, the loss function L(θ) is the cumulative loss of all training samples. The above process is shown in the right part of the figure below.

 Step 3: Find a θ such that the smaller the loss function L(θ), the better. Let L(θ) the smaller θ we denote by θ*. Generally, the gradient descent method is used to find the optimal parameter θ*. As shown below.

         The optimal parameter θ* makes the neural network (objective function fθ) specific to the optimal neural network (optimal objective function fθ*), and the optimal neural network (optimal objective function fθ*) is the final result we require.

 Meta Learning(Learn to learn)

        Meta Learning (Meta Learning) or Learn to Learn (Learn to Learn), it is to "learn how to learn", that is, to use previous knowledge and experience to guide the learning of new tasks, with the ability to learn to learn.

        In fact, learning this matter itself is a function. We use F (determined by people) to represent this function. The input of this function is not a picture, but a data set. The output of this function is another function f*, f* represents the optimal classifier.

        So can we let the machine directly learn this function F? Can we follow the steps of Machine Learning to learn this function F? Can we learn how to learn this thing? The answer is: definitely can!

        We can use the thinking steps of Machine Learning to learn this function F, and learning the function F is the goal of meta-learning.

        We know that Machine Learning wants to find the function fθ* through three steps, and Meta Learning also finds Learning algorithm (F*) through three steps.

Meta Learning has three steps:

Step 1: There is something to be learned in F. The optimal parameters Weights* and biases* of the function fθ* we are looking for in Machine Learning are to be learned, in other words, they are trained in the neural network. So it is no exception in Meta Learning. For the function F, we also need to find out something to be learned.

        In Machine Learning, Net Architecture, Initial Parameters, Learning Rate, etc. are all artificially set; but in Meta Learning, we hope that these things can be learned by machines. In Meta Learning, we use Φ to denote what we want to learn, that is, the components of learning.

The second step: define the loss function loss function, the loss function can determine the pros and cons of a certain set of parameters. We know that in Machine Learning, the loss function L(θ) comes from training samples. So in Meta Learning, how should the loss function L be defined? And see below.

        Before introducing the loss function in Meta Learning, let's take a look at what is trained in Meta Learning and what is tested.

        In Machine Learning, we only consider one task, the training content is called the training set, and the testing content is called the test set; in Meta Learning, the training content is the training task, and the training task is divided into training data and test data, the content of the test is the test task, and the test task can be divided into training data and test data.

        Take the binary classification problem as an example. In the above figure, there are two tasks in the training task (Training Tasks), namely Task1 and Task2. The goal of Task1 is to separate apples from oranges, while the goal of Task2 is to separate cars from bicycles. separate. In each training task, there are training data (Train) and test data (Test).

        Next we define the loss function L(Φ).

        With the above training tasks, how do we know whether FΦ is good or bad? (Note: FΦ represents F that has not been trained yet, and FΦ* represents F corresponding to the optimal parameter Φ* obtained after training.) We can send the training data of a certain training task into FΦ for learning, so as to obtain A classifier fθ1* whose job is to classify apples and oranges. If the performance of the classifier is better, then we can know that the performance of FΦ is better. As shown below. θ1* represents the parameter of the classifier fθ1*, which is learned by FΦ using the training data in the training task.

If this classifier performs poorly, then we can know that FΦ is performing poorly. As shown below. 

         So how can we know the performance of the classifier? The method is: we can use the test data in the training task to test the performance of the classifier.

        We send the test data in the training task to the classifier to get the predicted result, and compare it with the real label to get the error l^1. The process is shown in the figure below. The error l^1 indicates how well the classifier fθ1* performed on the test data. If the error l^1 is smaller, it means that the performance of the classifier on the test data is better, which can explain the better FΦ; and vice versa.

        So far, we have only considered one task Task1. In Meta Learning, multiple training tasks are generally considered. Therefore, for task Task2, a classifier fθ2* is learned in the same way, and the test data is sent to the classification The machine gets the predicted result and compares it with the real label to get the error l^2.

        Add the errors of Task1 and Task2 together to get the total loss Total loss.

        We only considered two training tasks in the example. In Meta Learning, there are generally many training tasks, so the total loss can be written as follows (N represents the number of training tasks).        

         In Meta Learning, for each training task, when calculating loss, we use the test data in the training task to calculate; while in general Machine Learning, we use training samples (training data) when calculating loss ) to calculate. As shown below.

 The above is the determination of the loss function in Meta Learning.

 Step 3: Find a Φ such that L(Φ) is the smallest, that is, the optimal parameter Φ* needs to be determined. As shown below.

         After the optimal parameter Φ* is determined, FΦ will be determined, namely get FΦ* , so far, we have FΦ*.

Framework for Meta Learning

        Given a training task ( there are training data and test data in the training task ), learn FΦ* through the training task, send the training data in the test task to FΦ* to get the classifier fθ * , and then use the test data in the test task Send it into fθ*, and you can get the predicted label of the test data. The whole process is shown in the figure below.

Difference Between Machine Learning and Meta Learning

Target 

        Both Machine Learning and Meta Learning aim to find a function. Taking the classification problem as an example, in Machine Learning, we strive to find a function f; while in Meta Learning, the function F we are looking for is to find f as the goal. As shown below.

 Training data and test data

        In Machine Learning, we only consider one task, we use the training data in this task for training, and the test data in this task are used for testing; while in Meta Learning, we have to consider more than one task, this Sometimes there will be many, many tasks. These tasks are divided into training tasks and test tasks. The training tasks can be divided into training data and test data, and the test tasks also include training data and test data. Generally, the training data in a training task can be called a Support set, and the testing data can be called a Query set.

         In Machine Learning, F is artificially set, F includes learning rate, network structure, etc., these contents are all set by us, input training data into F, and obtain the training result f θ* after training , which is the optimal classifier; and in Meta Learning, F is learned through many training tasks. As shown below.

 Loss function loss function

        In Machine Learning, the loss loss can be calculated by accumulating the losses of each training sample in the training set; while in Meta Learning, the loss loss can be calculated by accumulating the losses of the test samples (test data) in all training tasks. 

Summarize 

        In Machine Learning, we only consider one task, the training content is called the training set (training data), and the testing content is called the test set (test data); in Meta Learning, the training content is the training task, and The training task is further divided into training data and test data, the content of the test is the test task, and the test task can be further divided into training data and test data. 

        In Machine Learning, first we need to manually set hyperparameters (for example: learning rate, etc.), and then send the training set to the artificially set model for training to obtain the optimal parameters θ* (the optimal weight Weights* and The optimal bias biases*), so as to obtain the optimal classifier fθ*, and then send the test set to the classifier to get the prediction result.

        In Meta Learning, we don't need to manually give hyperparameters, the machine will learn them automatically. We learn by training using multiple training tasks to find the optimal F, namely F Φ *. The training data in the test task is sent to F Φ * to get the optimal classifier fθ* suitable for the new specific task (here refers to the test task), and then the test data in the test task is sent to the optimal classifier fθ* gets the predicted result.

        Meta-learning hopes to enable the model to acquire the ability to learn to learn and adjust parameters, so that it can quickly learn new tasks on the basis of acquiring existing knowledge. Machine learning is to adjust the parameters first, and then directly train the deep model under specific tasks. Meta-learning is to first train a better hyperparameter through other tasks, and then train for specific tasks.

        In machine learning, the training unit is the sample data, and the model is optimized through the data; the data can be divided into training set, test set and verification set.

        In meta-learning, the training unit is a task. Generally, there are two tasks, namely, the training task (Train Tasks), also known as cross-task (Across Tasks) and the test task (Test Task), also known as single task (Within Task). The training task needs to prepare many subtasks for learning, the purpose is to learn a better hyperparameter, and the test task is to use the hyperparameter learned from the training task to train a specific task. The data of each task in the training task is divided into Support set and Query set; the data in Test Task is divided into training set and test set.

        When training a neural network, the specific general steps are, preprocessing the data set D, selecting the network structure N, setting the hyperparameter γ, initializing the parameter θ0, selecting the optimizer O, defining the loss function L, and updating the parameter θ by gradient descent . The specific steps are shown on the left side of the figure below.

 

Guess you like

Origin blog.csdn.net/m0_48241022/article/details/132412728