[Notes] paper read | One shot learning with memory-augmented neural networks

  • 论文信息:Santoro A, Bartunov S, Botvinick M, et al. One-shot learning with memory-augmented neural networks[J]. arXiv preprint arXiv:1605.06065, 2016.
  • Hirofumi author: Veagau
  • Edit Time: 2020, January 7

This article is ICML 2016 conference papers, authors from Google's DeepMind. In the paper the authors propose a memory enhancing neural network (memory-augmented neural networks, abbreviated MANN) to quickly absorb information inherent in the sample and use this information to make accurate predictions of only a few samples of situation, namely small sample learning (few-Shot learning). The use of external memory means, the authors also proposed an effective method for obtaining the content of the external memory means.

Yuan learning process is mainly divided into two stages: the first stage, meta-learning models on different tasks, such as to achieve accurate classification within a specific data set, quickly learning; the second stage, meta-learning model extraction cross knowledge tasks, and guidance for the first phase of the use of this knowledge. Network model mentioned in the paper has demonstrated neural network with memory function suitable for this meta-learning scenario, but LSTM neural network used in the above can only temporarily store learned knowledge representation, it is one kind of internal memory (internal memory) network architecture, and learn from this article nerve Turing machine (neural Turing machine) thinking, the use of external memory (external memory) network architecture for accessing knowledge across tasks.

The entire network structure diagram below.

After data (data pair) still using offset manner is input to the network, the forward propagation process, the input samples will bind the target tag, after encoding in an external memory element, a next input sample memory , the network element memory content retrieved, parsed information related to prediction. The information is in the form of a matrix of memory elements for storing encoded information corresponding to each row of data corresponding to the sample matrix, i.e. matrix access to read and write operations. Turing machine using neural strategy used for reading (read) - calculating a new encoded representation of the input samples, then calculate the similarity matrix for each row of data, and finally obtain the final weighted prediction information. When using write (the Write) LRUA (Least Recently Logs in Used Access) - least recently used strategy, the storage unit least recently used covering operations, saving storage space and query expenses.

The use of memory enhancement network architecture can solve the sparse training data (small sample) of the problem, but the flexibility of the memory cell addressing scheme presented in this paper is still inadequate, the ability to make the network independent design addressing scheme, and allowed to adapt a wider range of tasks and learning and active learning combined also worthy of further study.

Guess you like

Origin www.cnblogs.com/veagau/p/12164330.html