CVPR2018迁移学习相关论文

1.Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks

Abstract
We propose a novel framework called Semantics Preserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training. SP-AEN aims to tackle the inherent problem — semantic loss — in the prevailing family of embedding-based ZSL, where some semantics would be discarded during training if they are non-discriminative for training classes, but could become critical for recognizing test classes. Specifically, SPAEN prevents the semantic loss by introducing an independent visual-to-semantic space embedder which disentangles the semantic space into two subspaces for the two arguably conflicting objectives: classification and reconstruction. Through adversarial learning of the two subspaces, SP-AEN can transfer the semantics from the reconstructive subspace to the discriminative one, accomplishing the improved zero-shot recognition of unseen classes. Comparing with prior works, SP-AEN can not only improve classification but also generate photo-realistic images, demonstrating the effectiveness of semantic preservation. On four popular benchmarks: CUB, AWA, SUN and aPY, SP-AEN considerably outperforms other state-of-the-art methods by an absolute performance difference of 12.2%, 9.3%, 4.0%, and 3.6% in terms of harmonic mean values [62].

使用语义保持对抗嵌入网络的零样本视觉识别

我们提出了一种新的框架,称为语义保持对抗嵌入网络(SP-AEN),用于零样本视觉识别(ZSL),其中测试图像及其类别在训练期间都是不可见的。 SP-AEN旨在解决流行的基于嵌入的ZSL系列中的固有问题 - 语义损失,其中一些语义在训练期间将被丢弃,如果它们对训练类没有判别性,但可能对于识别测试类变得至关重要。具体而言,SP-AEN通过引入独立的视觉 - 语义空间嵌入器来防止语义上的损失,该嵌入器将语义空间解开为两个子空间,用于两个可以说是有冲突的目标:分类和重构通过对两个子空间的对抗性学习,SP-AEN可以将语义从重建子空间转移到判别子空间,实现对未见类的零样本识别的改进。与之前的工作相比,SP-AEN不仅可以提升分类效果,还可以生成照片般逼真的图像,证明了语义保存的有效性。在四个流行的基准测试上:CUB,AWA,SUN和aPY,SP-AEN在调和平均值方面远远优于其他最先进的方法,绝对性能差异为12.2%,9.3%,4.0%和3.6%[62]。

  1. 语义损失是用什么衡量的?
  2. 独立的视觉-语义空间嵌入器是怎么防止语义损失的?

2.Multi-Label Zero-Shot Learning with Structured Knowledge Graphs

Abstract
In this paper, we propose a novel deep learning architecture for multi-label zero-shot learning (ML-ZSL), which is able to predict multiple unseen class labels for each input instance. Inspired by the way humans utilize semantic knowledge between objects of interests, we propose a framework that incorporates knowledge graphs for describing the relationships between multiple labels. Our model learns an information propagation mechanism from the semantic label space, which can be applied to model the inter-dependencies between seen and unseen class labels. With such investigation of structured knowledge graphs for visual reasoning, we show that our model can be applied for solving multi-label classification and ML-ZSL tasks. Compared to state-of-the-art approaches, comparable or improved performances can be achieved by our method.

基于结构化知识图的多标签零样本学习

在本文中,我们提出了一种用于多标签零样本学习(ML-ZSL)的新型深度学习架构,它能够为每个输入实例预测多个未见类的标签。 受人类在感兴趣对象之间利用语义知识的方式的启发,我们提出了一个框架,其中包含用于描述多个标签之间关系的知识图。 我们的模型从语义标签空间学习信息传播机制,可以应用于模拟已见和未见类标签之间的相互依赖性。 通过对视觉推理的结构化知识图的这种研究,我们证明了我们的模型可以应用于解决多标签分类和ML-ZSL任务。 与现有技术方法相比,我们的方法可以实现相当或提升的性能。

3.“Zero-Shot” Super-Resolution using Deep Internal Learning

Abstract
Deep Learning has led to a dramatic leap in Super Resolution (SR) performance in the past few years. However, being supervised, these SR methods are restricted to specific training data, where the acquisition of the low resolution (LR) images from their high-resolution (HR) counterparts is predetermined (e.g., bicubic downscaling), without any distracting artifacts (e.g., sensor noise, image compression, non-ideal PSF, etc). Real LR images, however, rarely obey these restrictions, resulting in poor SR results by SotA (State of the Art) methods. In this paper we introduce “Zero-Shot” SR, which exploits the power of Deep Learning, but does not rely on prior training. We exploit the internal recurrence of information inside a single image, and train a small image-specific CNN at test time, on examples extracted solely from the input image itself. As such, it can adapt itself to different settings per image. This allows to perform SR of real old photos, noisy images, biological data, and other images where the acquisition process is unknown or non-ideal. On such images, our method outperforms SotA CNN-based SR methods, as well as previous unsupervised SR methods. To the best of our knowledge, this is the first unsupervised CNN-based SR method.

使用深度内部学习的“零样本”超分辨率

在过去几年中深度学习导致了超级分辨率(SR)性能的巨大飞跃。然而,受监督,这些SR方法仅限于特定的训练数据,其中从其高分辨率(HR)对应物获取低分辨率(LR)图像是预定的(例如,双三次降采样),而没有任何分散注意力的伪像(例如, ,传感器噪声,图像压缩,非理想PSF等)。然而,真实LR图像很少遵守这些限制,导致SotA(现有技术)方法的SR结果较差。在本文中,我们介绍了“零样本”SR,它利用了深度学习的力量,但不依赖于先前的训练。我们利用单个图像内部信息的内部重现,并在测试时间训练一个小的图像特定CNN,仅对从输入图像本身提取的示例进行训练。因此,它可以适应每个图像的不同设置。这允许执行实际旧照片,噪声图像,生物数据以及获取过程未知或非理想的其他图像的SR。在这样的图像上,我们的方法优于基于SotA CNN的SR方法,以及先前的无监督SR方法。据我们所知,这是第一个无监督的基于CNN的SR方法。

4.Zero-Shot Sketch-Image Hashing

Abstract

Recent studies show that large-scale sketch-based image retrieval (SBIR) can be efficiently tackled by cross-modal binary representation learning methods, where Hamming distance matching significantly speeds up the process of similarity search. Providing training and test data subjected to a fixed set of pre-defined categories, the cutting-edge SBIR and cross-modal hashing works obtain acceptable retrieval performance. However, most of the existing methods fail when the categories of query sketches have never been seen during training.

In this paper, the above problem is briefed as a novel but realistic zero-shot SBIR hashing task. We elaborate the challenges of this special task and accordingly propose a zero-shot sketch-image hashing (ZSIH) model. An end-to-end three-network architecture is built, two of which are treated as the binary encoders. The third network mitigates the sketch-image heterogeneity and enhances the semantic relations among data by utilizing the Kronecker fusion layer and graph convolution, respectively. As an important part of ZSIH, we formulate a generative hashing scheme in reconstructing semantic knowledge representations for zero-shot retrieval. To the best of our knowledge, ZSIH is the first zero-shot hashing work suitable for SBIR and cross-modal search. Comprehensive experiments are conducted on two extended datasets, i.e., Sketchy and TU-Berlin with a novel zero-shot train-test split. The proposed model remarkably outperforms related works.

零样本草图-图像哈希

最近的研究表明,通过跨模式二进制表示学习方法可以有效地处理大规模基于草图的图像检索(SBIR),其中汉明距离匹配显著加速了相似性搜索的过程。提供一组固定的预定义的类别的训练和测试数据,最先进的SBIR和跨模态哈希工作可获得可接受的检索性能。但是,当查询草图的类别在训练期间从未见到时,大多数现有方法都会失败。

在本文中,上述问题被简要介绍为一种新颖但实际的零样本SBIR散列任务。我们详细说明了这项特殊任务的挑战,并因此提出了零样本草图图像哈希(ZSIH)模型。构建了端到端的三网络架构,其中两个被看作二进制编码器。第三个网络分别利用Kronecker融合层和图卷积,减轻了草图图像的异质性,增强了数据间的语义关系。作为ZSIH的重要组成部分,我们制定了一种生成哈希方案,用于重建零样本检索的语义知识表示。据我们所知,ZSIH是第一个适用于SBIR和跨模态搜索的零样本哈希工作。在两个扩展的数据集上进行了综合实验,即Sketchy和TU-Berlin,其中使用新颖的零样本训练-测试数据拆分。所提出的模型明显优于相关工作。

5.Generalized Zero-Shot Learning via Synthesized Examples

Abstract

We present a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint. Built upon a variational autoencoder based architecture, consisting of a probabilistic encoder and a probabilistic conditional decoder, our model can generate novel exemplars from seen/unseen classes, given their respective class attributes. These exemplars can subsequently be used to train any off-the-shelf classification model. One of the key aspects of our encoder-decoder architecture is a feedback-driven mechanism in which a discriminator (a multivariate regressor) learns to map the generated exemplars to the corresponding class attribute vectors, leading to an improved generator. Our model’s ability to generate and leverage examples from unseen classes to train the classification model naturally helps to mitigate the bias towards predicting seen classes in generalized zero-shot learning settings. Through a comprehensive set of experiments, we show that our model outperforms several state-of-the-art methods, on several benchmark datasets, for both standard as well as generalized zero-shot learning.

基于合成实例的广义零样本学习

我们提出了广义零样本学习的生成框架,其中训练和测试类不一定是不相交的。基于变分自编码器架构,由概率编码器和概率条件解码器组成,我们的模型可以根据各自给定的类属性生成已见/未见类新的样本。这些样本随后可用于训练任何现成的分类模型。我们的编码器 - 解码器架构的一个关键方面是反馈驱动机制,其中判别器(多变量回归器)学习将生成的样本映射到相应的类属性向量,从而实现改进的生成器。我们的模型生成和利用来自未见类的示例来训练分类模型的能力自然有助于减轻在广义零样本学习设置中预测已见类的偏差。通过一系列全面的实验,我们证明了我们的模型在几个基准数据集上优于几种最先进的方法,无论是标准还是广义零样本学习。

6.Feature Generating Networks for Zero-Shot Learning
Abstract
Suffering from the extreme training data imbalance between seen and unseen classes, most of existing state-of-theart approaches fail to achieve satisfactory results for the challenging generalized zero-shot learning task. To circumvent the need for labeled examples of unseen classes, we propose a novel generative adversarial network (GAN) that synthesizes CNN features conditioned on class-level semantic information, offering a shortcut directly from a semantic descriptor of a class to a class-conditional feature distribution. Our proposed approach, pairing a Wasserstein GAN with a classification loss, is able to generate sufficiently discriminative CNN features to train softmax classifiers or any multimodal embedding method. Our experimental results demonstrate a significant boost in accuracy over the state of the art on five challenging datasets – CUB, FLO, SUN, AWA and ImageNet – in both the zero-shot learning and generalized zero-shot learning settings.

基于特征生成网络的零样本学习

由于已见类和未见类之间的极端训练数据不平衡,大多数现有的最先进的方法未能在具有挑战性的广义零样本学习任务中获得满意的结果。 为了避免对未见类的标记示例的需要,我们提出了一种新的生成对抗网络(GAN),它综合了基于类级语义信息的CNN特征,提供了直接从类的语义描述符到类条件特征分布的捷径。我们提出的方法,将Wasserstein GAN与分类损失,能够生成足够有判别性的CNN特征来训练softmax分类器或任何多模式嵌入方法。 我们的实验结果表明,在零样本学习和广义零样本学习设置中,在五个具有挑战性的数据集(CUB,FLO,SUN,AWA和ImageNet)上对最先进的准确率有显著提高。

7.Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs
Abstract
We consider the problem of zero-shot recognition: learning a visual classifier for a category with zero training examples, just using the word embedding of the category and its relationship to other categories, which visual data are provided. The key to dealing with the unfamiliar or novel category is to transfer knowledge obtained from familiar classes to describe the unfamiliar class. In this paper, we build upon the recently introduced Graph Convolutional Network (GCN) and propose an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers. Given a learned knowledge graph (KG), our approach takes as input semantic embeddings for each node (representing visual category). After a series of graph convolutions, we predict the visual classifier for each category. During training, the visual classifiers for a few categories are given to learn the GCN parameters. At test time, these filters are used to predict the visual classifiers of unseen categories. We show that our approach is robust to noise in the KG. More importantly, our approach provides significant improvement in performance compared to the current state-of-the-art results (from 2 ∼ 3% on some metrics to whopping 20% on a few).

基于语义嵌入和知识图的两样本识别

我们考虑零样本识别的问题:为没有训练示例的类别学习一个视觉分类器,仅使用类别的词嵌入及与其他类别的关系,来提供视觉数据。处理不熟悉或新颖类别的关键是迁移从熟悉的类中获得的知识来描述不熟悉的类。在本文中,我们基于最近引入的图卷积网络(GCN),并提出了一种使用语义嵌入和分类关系来预测分类器的方法。给定学到的知识图(KG),我们的方法将每个节点作为输入语义嵌入(表示视觉类别)。在一系列图卷积之后,我们为每个类别预测视觉分类器。在训练期间,给出几个类别的视觉分类器以学习GCN参数。在测试时,这些过滤器用于预测未见类的视觉分类器。我们证明了我们的方法对KG的噪声很鲁棒。更重要的是,与目前最先进的结果相比,我们的方法在性能方面有显著改善(从一些指标的2%~3%到少数指标高达20%)。

8.Webly Supervised Learning Meets Zero-shot Learning: A Hybrid Approach for Fine-grained Classification

Abstract

Fine-grained image classification, which targets at distinguishing subtle distinctions among various subordinate categories, remains a very difficult task due to the high annotation cost of enormous fine-grained categories. To cope with the scarcity of well-labeled training images, existing works mainly follow two research directions: 1) utilize freely available web images without human annotation; 2) only annotate some fine-grained categories and transfer the knowledge to other fine-grained categories, which falls into the scope of zero-shot learning (ZSL). However, the above two directions have their own drawbacks. For the first direction, the labels of web images are very noisy and the data distribution between web images and test images are considerably different. For the second direction, the performance gap between ZSL and traditional supervised learning is still very large. The drawbacks of the above two directions motivate us to design a new framework which can jointly leverage both web data and auxiliary labeled categories to predict the test categories that are not associated with any well-labeled training images. Comprehensive experiments on three benchmark datasets demonstrate the effectiveness of our proposed framework.

网络监督学习遇到零样本学习:细粒度分裂的混合方法

由于庞大的细粒度类别的高注释成本,细粒度图像分类(其旨在区分各个从属类别之间的细微区别)仍然是非常困难的任务。为了应对好的标记训练图像的稀缺性,现有的工作主要遵循两个研究方向:1)利用可以自有获取的无人工标注的网络图像; 2)仅标注一些细粒度类别并将知识迁移到其他细粒度类别,这属于零样本学习(ZSL)的范围。但是,上述两个方向都有其自身的缺点。第一个方向,Web图像的标签非常杂乱,并且Web图像和测试图像之间的数据分布是相当不同的。第二个方向,ZSL与传统监督学习之间的性能差距仍然很大。上述两个方向的缺点促使我们设计一个新的框架,该框架可以联合利用网络数据和辅助标记类别来预测与任何标记良好的训练图像无关的测试类别。三个基准数据集的综合实验证明了我们提出的框架的有效性。

  1. 测试类别和网络数据类别相交吗?

9.Discriminative Learning of Latent Features for Zero-Shot Recognition

Abstract: Zero-shot learning (ZSL) aims to recognize unseen image categories by learning an embedding space between image and semantic representations. For years, among existing works, it has been the center task to learn the proper mapping matrices aligning the visual and semantic space, whilst the importance to learn discriminative representations for ZSL is ignored. In this work, we retrospect existing methods and demonstrate the necessity to learn discriminative representations for both visual and semantic instances of ZSL. We propose an end-to-end network that is capable of 1) automatically discovering discriminative regions by a zoom network; and 2) learning discriminative semantic representations in an augmented space introduced for both user-defined and latent attributes. Our proposed method is tested extensively on two challenging ZSL datasets, and the experiment results show that the proposed method significantly outperforms state-of-the-art methods.

零样本学习隐含特征的判别性学习

摘要:零样本学习(ZSL)的目标是通过学习图像表示和语义表示之间的嵌入空间来识别未曾见过的图像类别。多年以来,在已有的研究成果中,中心任务都是学习合适映射矩阵以对齐视觉空间和语义空间,而学习用于ZSL 的判别性表示的重要性却被忽视了。在本工作中,我们回顾了已有的方法,并证明了为 ZSL的视觉和语义实例学习判别性表示的必要性。我们提出了一种端到端的网络,能够做到:1)通过一个放大网络自动发现判别性区域;2)在一个为用户定义属性和隐含属性引入的扩增空间中学习判别性语义表示。我们的方法在两个有挑战性的ZSL 数据集上进行了大量测试,实验结果表明我们的方法显著优于之前最佳的方法。

判别性表示,判别性视觉表示,判别性语义表示,隐含属性。

10.Preserving Semantic Relations for Zero-Shot Learning

Abstract: Zero-shot learning has gained popularity due to its potential to scale recognition models without requiring additional training data. This is usually achieved by associating categories with their semantic information like attributes. However, we believe that the potential offered by this paradigm is not yet fully exploited. In this work, we propose to utilize the structure of the space spanned by the attributes using a set of relations. We devise objective functions to preserve these relations in the embedding space, thereby inducing semanticity to the embedding space. Through extensive experimental evaluation on five benchmark datasets, we demonstrate that inducing semanticity to the embedding space is beneficial for zero-shot learning. The proposed approach outperforms the state-of-the-art on the standard zero-shot setting as well as the more realistic generalized zero-shot setting. We also demonstrate how the proposed approach can be useful for making approximate semantic inferences about an image belonging to a category for which attribute information is not available.

基于语义关系保持的零样本学习

摘要:零样本学习具有缩放识别模型而不需要额外的训练数据的潜力,因此受到关注。通常ZSL通过将类别与其属性等语义信息相关联来实现这种能力。但是,我们认为这种范式所提供的潜力尚未得到充分利用。在这项工作中,我们提出使用一组关系来利用属性跨越的空间结构。我们设计目标函数来保持嵌入空间中的这些关系,从而引入嵌入空间的语义。通过对五个基准数据集的大量实验评估,我们证明了诱导嵌入空间的语义有利于零样本学习。我们的方法优于最先进的方法,无论是在标准零样本点设置还是更贴近实际的广义零样本设置。我们还证明了我们的方法如何用于对属性信息不可用的类别的图像进行近似语义推断

嵌入空间的语义?
属性不可用的类,近似语义推断?

11.Zero-Shot Kernel Learning

Abstract: In this paper, we address an open problem of zero-shot learning. Its principle is based on learning a mapping that associates feature vectors extracted from i.e. images and attribute vectors that describe objects and/or scenes of interest. In turns, this allows classifying unseen object classes and/or scenes by matching feature vectors via mapping to a newly defined attribute vector describing a new class. Due to importance of such a learning task, there exist many methods that learn semantic, probabilistic, linear or piece-wise linear mappings. In contrast, we apply well established kernel methods to learn a non-linear mapping between the feature and attribute spaces. We propose an easy learning objective inspired by the Linear Discriminant Analysis, Kernel-Target Alignment and Kernel Polarization methods [12, 8, 4] that promotes incoherence. We evaluate the performance of our algorithm on the Polynomial as well as shift-invariant Gaussian and Cauchy kernels. Despite simplicity of our approach, we obtain state-of-the-art results on several zero-shot learning datasets and benchmarks including a recent AWA2 dataset [45].

核零样本学习

摘要:在本文中,我们解决了零样本学习的开放性问题。其原理基于学习一个映射,该映射将从图像提取的特征向量与描述感兴趣的对象和/或场景的属性向量相关联。反过来,这允许通过映射到描述新类的新定义的属性向量来匹配特征向量来对看不见的对象类和/或场景进行分类。由于这种学习任务的重要性,已经存在许多学习语义,概率,线性或分段线性映射的方法。相比之下,我们应用完善的核方法来学习特征空间和属性空间之间的非线性映射。我们提出了一个简单的学习目标,其灵感来自于线性判别分析,核-目标对齐和核极化方法[12,8,4],它们改善了不一致性。我们评估了算法在多项式以及移位不变高斯和Cauchy核上的性能。尽管我们的方法很简单,但我们在几个零样本数据集和基准测试(包括最近的AWA2数据集[45])上获得了最先进的结果。

12.Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration

13.A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts

14.Transductive Unbiased Embedding for Zero-Shot Learning

15.One-Shot Action Localization by Learning Sequence Matching Network

16.CLEAR: Cumulative LEARning for One-Shot One-Class Image Recognition

17.Structured Set Matching Networks for One-Shot Part Labeling
18.Memory Matching Networks for One-Shot Image Recognition
19.Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning
20.Learning to Compare: Relation Network for Few-Shot Learning
21.Dynamic Few-Shot Visual Learning Without Forgetting
22.Few-Shot Image Recognition by Predicting Parameters From Activations
23.Multi-Content GAN for Few-Shot Font Style Transfer
24.Low-Shot Learning With Large-Scale Diffusion
25.Low-Shot Learning With Imprinted Weights
26.Low-Shot Learning From Imaginary Data

猜你喜欢

转载自blog.csdn.net/cp_oldy/article/details/82183813