CVPR 2018摘要:第四部分

标题
State of the Art in Domain Adaptation (CVPR in Review IV)
CVPR 2018摘要:第四部分
by 啦啦啦 2
01

State of the Art in Domain Adaptation (CVPR in Review IV)

We have already had three installments about the CVPR 2018 (Computer Vision and Pattern Recognition) conference: the first part was devoted to GANs for computer vision, the second part dealt with papers about recognizing human beings (pose estimation and tracking), and the third parttackled synthetic data. Today we dive deeper into the details of one field of deep learning that has been on the rise lately: domain adaptation. For this NeuroNugget, I’m happy to present to you my co-author Anastasia Gaydashenko, who has already left Neuromation and went on to join Cisco…but her texts live on, and this is one of them.

What is Domain Adaptation?

There are a couple of specific directions in research that are trending lately (including CVPR 2018), and one of them is domain adaptation. As this field is closely related to synthetic data, it is of great interest for us here at Neuromation, but the topic is also increasingly popular and important in and by itself.

Let’s start at the beginning. We have already discussed the most common tasks that constitute the basis for modern computer vision: image classificationobject and pose detection, instance and semantic segmentationobject tracking, and so on. These problems are solved quite successfully due to deep convolutional neural architectures and large amounts of labeled data.

But, as we discussed in the last installment, a big challenge always remains: for supervised learning, you always need to find or create labeled datasets. Almost any paper you read about some fancy state of the art model will mention some problems with the dataset, unless they use one of the few standard “vanilla” datasets that everybody usually compares on. Thus, collecting labeled data has become as important as designing the networks themselves. These datasets should be reliable and diverse enough so researchers would be able to use them to develop and evaluate novel architectures.

领域适应的最新进展(IV回顾中的CVPR)

我们已经分三期关于CVPR 2018(计算机视觉和模式识别)会议:第一部分专门讨论计算机视觉的GAN,第二部分涉及关于识别人类(姿势估计和跟踪)的论文,第三部分涉及合成数据。 今天,我们深入探讨最近一直在兴起的深度学习领域的细节:领域适应。 对于这个NeuroNugget,我很高兴为您呈现我的共同作者Anastasia Gaydashenko,他已离开Neuromation并继续加入思科...但她的研究继续存在,这就是其中之一。

什么是域适应?

最近研究中有几个具体趋势(包括CVPR 2018),其中一个是领域适应。 由于这个领域与合成数据密切相关,因此我们在Neuromation对我们非常感兴趣,但这个主题在本身也越来越受欢迎和重要。

让我们从头开始。 我们已经讨论了构成现代计算机视觉基础的最常见任务:图像分类对象姿势检测,实例和语义分割对象跟踪等。 由于深度卷积神经架构和大量标记数据,这些问题得到了相当成功的解决。

但是,正如我们在上一部分中所讨论的那样,总是存在一个巨大的挑战:对于监督学习,你总是需要找到或创建标记数据集。 几乎所有关于某些奇特的现有技术模型的论文都会提到数据集的一些问题,除非他们使用每个人通常比较的少数标准“

vanilla  ”数据集之一。 因此,收集标记数据与设计网络本身一样重要。 这些数据集应该足够可靠和多样化,以便研究人员能够使用它们来开发和评估新颖的架构。


by 老赵 2
02

We have already talked many times about how manual data collection is both expensive and time-consuming, often exceedingly so. Sometimes it is even flat out impossible to label the data manually (for example, how do you label for depth estimation, the problem of evaluating the distances from points on the image to the camera?). Of course, many standard problems already have large labeled datasets that are freely or easily available. But first, this readily labeled data can (and does) bias research towards the specific field where it is available, and second, your own problem will never be exactly the same, and standard datasets will often simply not fit your demands: they will contain different classes, will be biased in different ways, and so on.

The main problem with using existing datasets, or even synthetic data generators that were not done specifically for your particular problem, is that when the data is generated and already labeled we are still facing the problem of domain transfer: how do we use one kind of data to prepare the networks to cope with different kinds? This problem also looms large for the entire field of synthetic data: however realistic you make your data, it still cannot be completely indistinguishable from real world photographs. The major underlying challenge here is known as domain shift: basically, the distribution of data in the target domain (say, real images) is different than in the source domain (say, synthetic images). Devising models that can cope with this shift is exactly the problem called domain adaptation.

Let us see how people are handling this problem now, considering a few papers from CVPR 2018 in slightly more details than we used to in previous “CVPR in Review” installments.

Unsupervised Domain Adaptation with Similarity Learning

This work by Pedro Pinheiro (see pdf here) comes from ElementAI, a Montreal company co-founded in 2016 by none other than Yoshua Bengio. It deals with an approach to domain adaptation based on adversarial networks, the kind we touched upon a little bit before (see also this post, the second part for which is coming really soon… it is, it is, I promise!).

The simplest adversarial approach to unsupervised domain adaptation is a network that tries to extract features that remain the same across the domains. To achieve this, the network tries to make them indistinguishable for a separate part of the network, a discriminator (“disc” in the figure below). But at the same time, these features should be representative for the source domain so the network will be able to classify objects:



我们已经多次谈到手动数据收集既昂贵又耗时,往往非常耗费精力。 有时甚至不可能手动标记数据(例如,如何标记深度估计,评估图像上的点到相机的距离的问题?)。 当然,许多标准问题已经具有可自由或容易获得的大型标记数据集。 但首先,这些易于标记的数据可以(并且确实)将研究偏向于可用的特定领域,其次,你自己的问题永远不会完全相同,标准数据集通常根本不符合您的要求:它们将包含不同的类别,会有不同的偏置,等等。

使用现有数据集,甚至是没有专门针对你的特定问题的合成数据生成器的主要问题是,当生成数据并且已经标记时,我们仍然面临域转移的问题:我们如何使用一种数据准备网络应对不同种类? 对于整个合成数据领域来说,这个问题也很突出:无论你制作数据是否真实,它仍然无法与现实世界的照片完全区分开来。 这里的主要潜在挑战称为域移位:基本上,目标域中的数据分布(例如,真实图像)与源域中的数据分布(例如,合成图像)不同。 设计能够应对这种转变的模型正是称为域适应的问题。

让我们看看人们现在如何处理这个问题,考虑一下CVPR 2018中的一些论文,比之前的“CVPR in Review”分期付款稍微详细一些。

具有相似性学习的无监督域适应

Pedro Pinheiro的这项工作(见pdf)来自ElementAI,这是一家蒙特利尔公司,于2016年由Yoshua Bengio共同创立。 它涉及一种基于对抗性网络的域适应方法,我们之前提到的那种方式(参见本文,第二部分即将推出)。

对无监督域自适应的最简单的对抗方法是尝试提取跨域保持相同的特征的网络。 为了实现这一点,网络试图使它们与网络的单独部分(鉴别器(下图中的“光盘”)无法区分。 但与此同时,这些功能应该代表源域,以便网络能够对对象进行分类:




by 老赵 2
03

In this way, the network has to extract features that would achieve two objectives at once: (1) be informative enough that the “class” network (usually very simple) can classify, and (2) be independent of the domain so that the “disc” network (usually as complex as the feature extractor itself, or more) cannot really distinguish. Note that we don’t have to have any labels for the target domain, only for the source domain, where it is usually much easier (again, think synthetic data for the source domain).

In Pinheiro’s paper, this approach is improved by replacing the classifier part with a similarity-based one. The discriminative part remains the same, and the classification part now compares the embedding of an image with a set of prototypes; all these representations are learned jointly and in an end-to-end fashion:



Basically, we are asking one network, g, to extract features from a labeled source domain and another network, f, to extract features from an unlabeled target domain, with a similar but different data distribution. The difference is that now f and g are different (we had the same f in the picture above), and the classification is now different: instead of training a classifier, we train the model to discriminate the target prototype from all other prototypes. And to label the image from the target domain, we compare the embedding of an image with embeddings of prototype images from the source domain, assigning the label of its nearest neighbors:



The paper shows that the proposed similarity-based classification approach is more robust to the domain shift between the two datasets.

通过这种方式,网络必须提取能够同时实现两个目标的特征:(1)足够的信息,“类”网络(通常非常简单)可以分类,(2)独立于域,以便 “光盘”网络(通常与特征提取器本身一样复杂,或更多)无法真正区分。 请注意,我们不必为目标域提供任何标签,仅针对源域,通常更容易(再次考虑源域的合成数据)。

在Pinheiro的论文中,通过用基于相似性的部分替换分类器部分来改进这种方法。 判别部分保持不变,分类部分现在比较图像与一组原型的嵌入; 所有这些表述都是以端到端的方式共同学习的:




基本上,我们要求一个网络g从标记的源域和另一个网络f中提取特征,以从未标记的目标域中提取具有相似但不同的数据分布的特征。 不同之处在于现在f和g是不同的(我们在上图中有相同的f),并且分类现在是不同的:我们训练模型以区分目标原型和所有其他原型,而不是训练分类器。 为了标记来自目标域的图像,我们将图像的嵌入与来自源域的原型图像的嵌入进行比较,分配其最近邻的标签:




本文表明,所提出的基于相似性的分类方法对于两个数据集之间的域移位更加稳健。


by 老赵 2
04

Image to Image Translation for Domain Adaptation

In this work by Murez et al. (full pdf), coming from UCSD and HRL Laboratories, the main idea is actually rather simple, but the implementation is novel and interesting. The work deals with a more complex task than classification, namely image segmentation (see, e.g., our previous post), which is widely used in autonomous driving, medical imaging, and many other domains. So what is this “image translation” thing they are talking about?

Let us begin with regular translation. Imagine that we have two large text corpora in different languages, say English and French, and we don’t know which phrases correspond to which. They may be even slightly different and may lack the corresponding translations in the other language corpus. Just like the pictures from synthetic and real domains. Now, to get a machine translation model we translate a phrase from English to French and will try to distinguish the embedding of the resulting phrase from embeddings of phrases from the original French corpus. And then the way to check that we haven’t lost much is to try to translate this phrase back to English; now, even if the original corpora were completely unaligned, we know what we’re looking for: the answer is just the original sentence!

Now let us look at the image to image translation which is, actually, pretty similar. Basically, domain adaptation techniques aim to address the domain shift problem by finding a mapping from the source data distribution to the target distribution. Alternatively, both domains X and Y could be mapped into a shared domain Z where the distributions are aligned; this is the approach used in this paper. This embedding must be domain-agnostic (independent of the domain), hence we want to maximize the similarity between the distributions of embedded source and target images.



For example, suppose that X is the domain of driving scenes on a sunny day and Y is the domain of driving scenes on a rainy day. While “sunny” and “rainy” are characteristics of the source and target domains, they are in fact variations that mean next to nothing for the annotation task (e.g., semantic segmentation of the road), and they should not affect the annotations. Treating such characteristics as structured noise, we would like to find a latent space Z that would be invariant to such variations. In other words, domain Z should not contain domain-specific characteristics, that is, be domain-agnostic.

In this case, we also want to restore annotations for an image from the target domain. Therefore, we also need to add a mapping from the shared embedding space to the labels. It may be image-level labels such as classes in a classification problem or pixel-level labels such as semantic segmentation:



域适应的图像到图像翻译

在Murez等人的这项工作中(完整的pdf)。来自加州大学圣地亚哥分校和HRL实验室,主要的想法实际上相当简单,但实施是新颖和有趣的。 该工作涉及比分类更复杂的任务,即图像分割(参见我们之前的帖子),其广泛用于自动驾驶,医学成像和许多其他领域。 那么他们所谈论的这种“形象翻译”是什么?

让我们从常规翻译开始。 想象一下,我们有两个不同语言的大型文本语料库,比如英语和法语,我们不知道哪些短语对应哪个。 它们甚至可能略有不同,可能缺少其他语言语料库中的相应翻译。 就像来自合成域和真实域的图片一样。 现在,为了得到一个机器翻译模型,我们将一个短语从英语翻译成法语,并试图将所得短语的嵌入与原始法语语料库中的短语嵌入区分开来。 然后检查我们没有失去太多的方法是尝试将这个短语翻译成英语; 现在,即使原始语料库完全不对齐,我们也知道我们在寻找什么:答案就是原始句子。

现在让我们看看图像到图像的转换,实际上,它非常相似。 基本上,域自适应技术旨在通过找到从源数据分布到目标分布的映射来解决域移位问题。 或者,域X和Y都可以映射到共享域Z,其中分布是对齐的; 这是本文中使用的方法。 这种嵌入必须是域不可知的(独立于域),因此我们希望最大化嵌入源和目标图像的分布之间的相似性。




例如,假设X是晴天驾驶场景的领域,Y是下雨天驾驶场景的领域。 虽然“晴天”和“下雨”是源域和目标域的特征,但实际上它们对于注释任务(例如,道路的语义分段)几乎没有任何意义,并且它们不应该影响注释。 在处理诸如结构化噪声之类的特征时,我们希望找到对这种变化不变的潜在空间Z. 换句话说,域Z不应包含特定于域的特征,即与域无关。

在这种情况下,我们还希望从目标域恢复图像的注释。 因此,我们还需要添加从共享嵌入空间到标签的映射。 它可能是图像级标签,如分类问题中的类或像素级标签,如语义分段:




by 老赵 2
05

Basically, that’s the whole idea! Now, to obtain the annotation for an image from the target domain we just need to get its embedding in the shared space Z and restore its annotation from C. This is the basic idea of the approach, but it can be further improved by the ideas proposed in this paper.

Specifically, there are three main tools needed to achieve successful unsupervised domain adaptation:

  • domain-agnostic feature extraction, which means that distributions of features extracted from both domains should be indistinguishable as judged by an adversarial discriminator network,

  • domain-specific reconstruction, which means that we should be able to decode embeddings back to the source and target domains, that is, we should be able to learn functions gX and gY like shown here:



  • cycle consistency to ensure that the mappings are learned correctly, that is, we should be able to get back where we started in cycles like this:



The whole point of the framework proposed in this work is to ensure these properties with loss functions and adversarial constructions. We will not go into the gritty details of the architectures since they may change for other domains and problems.

But let’s have a look at the results! At the end of the post, we will make a detailed comparison between three papers on domain adaptation, but now let’s just have a look at a single example. The paper used two datasets: a synthetic dataset from Grand Theft Auto 5 and a real-world Cityscapes dataset with pictures of cities. Here are two sample pictures:



And here are the segmentation results for the real-world image (B above):



On this picture, E is the ground truth segmentation, C is the result produced without domain adaptation, simply by training on the synthetic GTA5 dataset, and D is the result with domain adaptation. It does look better, and the numbers (intersection-over-union metric) do bear this out.

基本上,这就是整个想法。 现在,要从目标域获取图像的注释,我们只需要将其嵌入到共享空间Z中并从C恢复其注释。这是该方法的基本思想,但可以通过这些思想进一步改进 本文提出。

具体而言,实现成功的无监督域适应需要三个主要工具:

       域无关特征提取,这意味着从对抗性鉴别器网络判断,从两个域提取的特征的分布应该是难以区分的,

       特定域的重建,这意味着我们应该能够将嵌入解码回源域和目标域,也就是说,我们应该能够学习如下所示的函数gX和gY:




       循环一致性,以确保正确学习映射,也就是说,我们应该能够回到我们开始的循环,如下所示:




在这项工作中提出的框架的重点是确保这些属性具有损失函数和对抗结构。 我们不会深入研究架构的细节,因为它们可能会针对其他领域和问题进行更改。

但是让我们来看看结果。在帖子的最后,我们将对三篇关于领域适应的论文进行详细比较,但现在让我们看一个例子。 本文使用了两个数据集:来自侠盗猎车手5的合成数据集和带有城市图片的真实世界城市景观数据集。 这是两张示例图片:




以下是真实世界图像的分割结果(上图B):




在这张图片中,E是地面真实分割,C是没有域适应的结果,只需通过训练合成GTA5数据集,D是域适应的结果。 它确实看起来更好,并且数字(交叉联合度量)确实证实了这一点。


by 老赵 2
06

Conditional Generative Adversarial Network for Structured Domain Adaptation

This paper by Hong et al. (full pdf) proposes another modification of a standard discriminator-segmentator architecture. From the first look at the architecture, we may not even notice any difference:



But actually this architecture does something very interesting: it integrates a GAN into a fully convolutional network (FCN). We have discussed FCNs in a previous NeuroNugget post; it s the network architecture used for the segmentation problem that returns labels for each pixel in the picture by feeding the features through deconvolution layers.

In this model, a GAN is used to mitigate the gap between source and target domains. For example, the previous paper aligns two domains via an intermediate feature space and thereby implicitly assumes the same decision function for both domains. This approach relaxes this assumption: here we learn the residual between feature maps from both domains because the generator learns to produce features like the ones from a real image in order to fool the discriminator; afterwards,FCN parameters are updated to accommodate the changes GAN has made.

Again, we will show a numerical comparison of the result below but here are some examples from the dataset:



Remarkably, in this work the authors have also provided something very similar to what we are doing in our studies into the efficiency of synthetic data: they have measured the accuracy of the results (again measured with intersection-over-union) depending on the portion of synthetic images in the dataset:



结构域自适应的条件生成对抗网络

本文由Hong等人撰写(完整的pdf)提出了标准鉴别器 - 分段器架构的另一种修改。 从第一次看到架构,我们甚至可能没有注意到任何差异:




但实际上这种架构非常有趣:它将GAN集成到完全卷积网络(FCN)中。 我们在之前的NeuroNugget帖子中讨论了FCN; 它是用于分割问题的网络体系结构,它通过反卷积层提供特征来返回图片中每个像素的标签。

在此模型中,GAN用于缓解源域和目标域之间的差距。 例如,前一篇论文通过中间特征空间对齐两个域,从而隐含地假定两个域具有相同的决策函数。 这种方法放松了这个假设:在这里我们学习来自两个域的特征图之间的残差,因为生成器学会产生类似于真实图像中的特征以欺骗鉴别器; 之后,更新FCN参数以适应GAN所做的更改。

同样,我们将显示下面结果的数字比较,但这里是数据集中的一些示例:




值得注意的是,在这项工作中,作者还提供了与我们在合成数据效率研究中所做的非常类似的事情:他们已经测量了结果的准确性(再次通过交叉结合测量)取决于部分 数据集中的合成图像:




by 老赵 2
07

Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation

This work by Sankaranarayanan et al. (full pdf) presents another modification of the basic approach based on GANs that brings the embeddings closer in the learned feature space. This time, let us begin with the picture and then explain it:



The base network, whose architecture is similar to a pre-trained model such as VGG-16, is split into two parts: the embedding denoted by F and the pixel-wise classifier denoted by C. The output of C is a map of labels upsampled to the same size as the input of F. The generator network G takes as input the learned embedding and reconstructs the RGB image. The discriminator network D performs two different tasks given an input: it classifies the input as real or fake in a domain-consistent manner and also performs a pixel-wise labeling task similar to the network C (this is applied only to source data since target data does not come with any labels during training).

So the main contribution of this work is a technique that employs generative models to align the source and target distributions in the feature space. For this purpose, the authors first project intermediate feature representations obtained using a CNN to the image space by training a reconstruction part of the network and then impose the domain alignment constraint by forcing the network to learn features such that source features produce target-like images when passed to the reconstruction module and vice versa.

Sounds complicated, doesn’t it? Well, let’s see how all of these methods actually compare.

A Numerical Comparison of the Results

We have chosen these three papers for an in-depth look because their results are actually comparable! All three papers used domain adaptation with GTA5as the source (synthetic) dataset and Cityscapes as the target dataset, so we can literally just compare the numbers.

The Cityscapes dataset contains 19 classes characteristic for city outdoor scenes such as “road”, “wall”, “person”, “car”, etc. And all three papers actually contain tables with results broken down with respect to the classes.

Murez et al., image-to-image translation:



Hong et al., conditional GAN:



从合成数据中学习:解决语义分割的域移位问题

这项工作由Sankaranarayanan等人完成(完整的pdf)介绍了基于GAN的基本方法的另一种修改,它使嵌入在学习的特征空间中更接近。 这一次,让我们从图片开始,然后解释它:




基础网络的结构类似于预先训练的模型,如VGG-16,分为两部分:F表示的嵌入和C表示的逐像素分类器。C的输出是标签的映射上采样到与F的输入相同的大小。生成器网络G将学习的嵌入作为输入并重建RGB图像。 鉴别器网络D在给定输入的情况下执行两个不同的任务:它以域一致的方式将输入分类为真实或伪造,并且还执行类似于网络C的像素标记任务(这仅适用于源数据,因为目标数据在训练期间没有任何标签)。

因此,这项工作的主要贡献是采用生成模型来对齐特征空间中的源和目标分布的技术。 为此,作者首先通过训练网络的重建部分,将使用CNN获得的中间特征表示投影到图像空间,然后通过强制网络学习特征使得源特征产生类似目标的图像来强加域对齐约束。 当传递给重建模块时,反之亦然。

听起来很复杂, 那么,让我们看看所有这些方法实际上是如何比较的。

结果的数值比较

我们选择这三篇论文进行深入研究,因为它们的结果实际上是可比较的! 所有这三篇论文都使用了GTA5的域适应作为源(合成)数据集和Cityscapes作为目标数据集,因此我们可以简单地比较这些数字。

Cityscapes数据集包含19个城市户外场景的特征,如

“road”, “wall”, “person”, “car”  等。所有这三篇论文实际上都包含表格,其中的结果按类别进行细分。

Murez等人,图像到图像的翻译:




Hong等人,条件GAN:




by 老赵 2
08

Sankaranarayanan et al., GAN in an FCN:



The mean results are 31.8, 44.5, 37.1 respectively, so it appears that the image-to-image approach is the least successful and Conditional GAN is the winner. For clarity, let us also compare the top-3 most and least distinguishable classes (i.e., with best and worst results) for every approach.

Most distinguishable, in the same order of models:

  • road (85.3), car (76.7), veg (72.0)

  • road (89.2), veg (77.9), car (77.8)

  • road (88.0), car (80.4), veg (78.7)

This is not too interesting, obviously roads and cars are always the best. But with the worst classes the situation is different:

  • train (0.3), bike (0.6), rider (3.3)

  • train (0.0), fence (10.9), wall (13.5)

  • train (0.9), t sign (11.6), pole (16.7)

Again, the “train” class seems to pose some kind of an insurmountable challenge (probably there’re just not so many trains in the training set, pardon the pun), but the others are all different. So let us compare all models based on the “bike”, “rider”, “fence”, “wall”, “t sign”, and “pole” classes. Now their scores will be very distinct:



You can draw different conclusions from these results. But the main result that we personally find truly exciting is that with many different approaches that could be proposed for such a complex task, results in different papers at the same conference (so the authors could not follow one another, these results appeared independently) are perfectly comparable with each other, and researchers do not hesitate to publish these comparable numbers instead of some comfortable self-developed metrics that would prove their unquestionable supremacy. Way to go, modern machine learning!

And finally, let us finish on a lighter note, with one more fun paper about synthetic data.

Sankaranarayanan等人,GAN in FCN:




平均结果分别为31.8,44.5,37.1,因此看起来图像到图像的方法是最不成功的,条件GAN是赢家。 为清楚起见,我们还要比较每种方法的前3个最不可区分的类别(即最佳和最差结果)。

最明显的是,按照相同的模型顺序:

  • road (85.3), car (76.7), veg (72.0)

  • road (89.2), veg (77.9), car (77.8)

  • road (88.0), car (80.4), veg (78.7)

但是最糟糕的课程情况则不同:

  • train (0.3), bike (0.6), rider (3.3)

  • train (0.0), fence (10.9), wall (13.5)

  • train (0.9), t sign (11.6), pole (16.7)

再次,

“train” 类似乎构成了一种不可逾越的挑战(可能在训练集中没有那么多集合),但其他人都是不同的。 因此,让我们比较所有基于“自行车”,

“bike”, “rider”, “fence”, “wall”, “t sign”, 和“pole”  类的模型。 现在他们的分数将非常明显:




你可以从这些结果中得出不同的结论。 但是我们个人觉得真正令人兴奋的主要结果是,对于这样一个复杂的任务可以提出许多不同的方法,在同一个会议上产生不同的论文(因此作者不能互相追随,这些结果独立出现)是 完全可以相互比较,研究人员毫不犹豫地发布这些可比较的数字,而不是一些舒适的自我开发的指标,这将证明他们无可置疑的至高无上的地位方式去嘻嘻嘻现代机器学习。

最后,让我们以更轻松的方式完成,还有一篇关于合成数据的有趣论文。


by 老赵 2
09

Free supervision from video games

In this work, Philipp Krähenbühl (full pdf) created a wrapper for the ever popular Microsoft DirectX rendering API and added a specialized code into the game as it is running. This enables the DirectX engine to produce ground truth labels for instance segmentation, semantic labeling, depth estimation, optical flow, intrinsic image decomposition, and instance tracking in real time! Which sounds super cool because now, instead of labeling data manually or creating special purpose synthetic data engines, a researcher can just play video games all day long! All you need to do is find a suitable 3D game:



And with that, we finish the fourth installment on CVPR 2018. Thank you for your attention — and stay tuned!

Sergey Nikolenko
Chief Research Officer, Neuromation

Anastasia Gaydashenko
former Research Intern at Neuromation, currently Machine Learning Intern at Cisco


免费监督视频游戏

在这项工作中,PhilippKrähenbühl(完整的pdf)为流行的Microsoft DirectX渲染API创建了一个包装器,并在游戏运行时为游戏添加了专门的代码。 这使得DirectX引擎能够实时生成地面实况标签,例如分段,语义标记,深度估计,光流,内在图像分解和实例跟踪! 这听起来非常酷,因为现在,研究人员不仅可以手动标记数据或创建专用合成数据引擎,而且可以整天玩视频游戏! 您需要做的就是找到合适的3D游戏:




我们完成了CVPR 2018的第四部分。感谢你的关注 - 敬请关注。

Sergey Nikolenko
Chief Research Officer, Neuromation

Anastasia Gaydashenko
former Research Intern at Neuromation, currently Machine Learning Intern at Cisco

by 老赵 2

猜你喜欢

转载自blog.csdn.net/sinat_33487968/article/details/84371242