Abstract

人类可以在无监督的情况下轻易地发现两个东西之间的联系(或者说相同点), 而想让机器学习的话需要人类给他们配对作为ground truth然后再进行训练.
为了避免这种配对的麻烦, 提出了DiscoGAN

Introduction

这篇文章把”寻找两种图片的关系”变成了”用一种风格的图片生成另一种风格”(利用GAN),这是本文解决”寻找relation”的思路所在.
不需要人工配对图片(作为监督学习的训练集), 是无监督的.

Model

GAB表示域A到域B的生成器，GBA表示域B到域A的生成器。为了找到有意义的对应关系，需要将这个映射限制成一对一映射，意思就是说，GAB和GBA应该是刚好相反的映射。对于所有的A里面的真实样本xA，GAB(xA)都要在B里面，对于GBA(xB)也一样。

两个Constrain

前面提到, 希望找到的映射是bijection, 也就是说, GAB is the inverse mapping of GBA
GAB的结果一定要在B domain里, 反之相同.

这两个限制分别用下面两个loss实现:

1513920874021_2.png

标准GAN模型

图中xA、xB分别表示A里面和B里面的真实样本，xAB表示真实样本xA经生成器GAB生成的样本。
缺点：

只能从A映射到B, 反之不能
无法保证bijection, 即会出现model collapse
生成的图片不是 image-based representation(不理解)

带有重建损失的GAN模型

基于前面所提出的一对一映射，对于所有的A里面的真实样本xA，GAB(xA)都要在B里面，这相当于要满足GBA(GAB(xA))=xA这个条件，但是这个条件很难优化，于是改为最小化距离d(GBA(GAB(xA)),xA)，这个d可以是L1，L2等等的度量函数。

于是，标准的GAN模型就被改成了这样：

优点：

缺点:

During training, the generator GAB learns the mapping from domain A to domain B under two relaxed constraints:

domain A maps to domain B. (LCONSTA)
mapping on domain B is reconstructed to domain A. (LGANB)
However, this model lacks a constraint on mapping from B to A, and these two conditions alone does not guarantee a cross-domain relation (as defined in section 2.1) because the mapping satisfying these constraints is one-directional. In other words, the mapping is an injection, not bijection, and one-to-one correspondence is not guaranteed.

从上面这句话来看, LCONSTA仅仅用于调整GAB的参数, 而不用于GBA, 因此才会导致lacks a constraint on mapping from B to A. 不过这不重要, 重要的是下面的缺点二.

1.In some sense, the addition of a reconstruction loss to a standard GAN is an attempt to remedy the mode collapse problem.

2.In Figure 3c, two domain A modes are matched with the same domain B mode, but the domain B mode can only direct to one of the two domain A modes.
3.Although the additional reconstruction loss LCONSTA forces the reconstructed sample to match the original (Figure 3c), this change only leads to a similar symmetric problem. The reconstruction loss leads to an oscillation between the two states and does not resolve mode-collapsing.

出现的问题的示意图在(c); 注意最后一句话, 如果出现了两个A的model A1和A2被map到了同一个B, 而GAB和GBA都是函数, 即都满足一个输入只能对应一个输出, 因此GBA只能映射到一个A model(设为Areconst). 这样, LCONSTA就会使得Areconst要么接近A1, 要么接近A2, 即an oscillation between the two states.

生成器的损失函数如下：

生成器GAB收到两个损失，一个是重建损失(reconstruction loss),描述经过两个生成器之后的重建效果与原始真实样本的差距，另外一个是原始GAN的生成损失，表示GAB生成的样本来自B的逼真性。
判别器损失如下：

D表示相应域的判别器。
相对于原始的GAN模型，这里的重建约束虽然迫使重建样本与原始的一样，但是这仍然会导致类似的模式崩溃问题。

(a)是我们理想的映射，一对一的；(b)是原始GAN的结果，A中的多个模式映射到了B中的一个模式，
就是模式崩溃的情况；(c)是加入了重建损失的GAN，A中两个模式的数据都映射到了B中的一个模式，
而B中一个模式的数据只能映射到A中这两个模式中的一个。重建损失使得模型在(c)中的两个状态之间
震荡，而并不能解决模式崩溃问题。
个人的理解是，当A的数据放进去一起训练，由于不管是放进哪一个模式，GAB都会产生B中对应的一个模式,
而GBA再生成的时候，当生成A中的第一个模式，和第二个模式就不像了，于是接下来重建损失会使得GBA再生成
的数据往第二个模式靠拢，但是又不像A里面的第一个模式了，于是模型在这两者之间来回震荡，导致无法收敛。

DiscoGAN

为了解决模式崩溃的问题，就要使得不管是GAB还是GBA，不同模式生成出来的就应该是不同的，于是很自然地想到了对称结构，就是再加一个反过来的生成网络，迫使A和B中的数据一一对应。

模型中包含两个生成器GAB，这两个GAB是一样的，还有两个生成器GBA，这两个也是一样的。这样就实现了一对一的映射。
损失函数：

优点:

This model is constrained by two LGAN losses and two LCONSTlosses.

Therefore a bijective mapping is achieved, and a one-to-one correspondence, which we defined as cross-domain relation, can be discovered.

Experiment

Toy Experiment

Toy Experiment不仅展示了GAN的G和D生成数据的过程, 而且解释了DiscoGAN的原理.

实验内容:

用GMM取得source和target samples.
Fig.4里面的所有图都是B domain; 颜色代表了DB(xAB) 黑色的”x”代表target samples

首先为了证明所提出的这种对称模型对于模式崩溃问题的良好性能，做了一个演示实验。A和B中的数据都是二维的，真实样本都取自混合高斯模型。用3个线性层和一个ReLU激活层作为生成器，判别器用5个线性层，每层后面接一个ReLU层，最后再接一个sigmoid层将输出限定在[0,1]之间。

起始状态

所有的source经过GAB映射到了B domain的同一个点上. (除非是GAB的参数全部初始化为0, 不然怎么会出现这种情况?)

我不太明白为什么途中都是白色(DB(xB)=0.5), DB对real data不应该输出1吗?

Standard GAN model

Many translated points of different colors are located around the same B domain mode.

This result illustrates the mode-collapse problem of GANs since points of multiple colors (multiple A domain modes) are mapped to the same B domain mode.
Regions around all B modes are leveled in a green colored plateau in the baseline, allowing translated samples to freely move between modes
我不太明白为什么途中都是绿色(DB(xAB)=1), DB对fake data怎么会输出1?

GAN with reconstruction loss

The collapsing problem is less prevalent, but navy, green and light-blue points still overlap at a few modes

The regions between B modes are clearly separated

DiscoGAN

Not only prevent mode-collapse by translating into distinct well-bounded regions that do not overlap, but also generate B samples in all ten modes as the mappings in our model is bijective.

The discriminator for B domain is perfectly fooled.

彩色背景表示判别器的输出值，”x”表示B种不同的模式。(a)标示了10个目标模式和最初的转换结果；(b)是标准的GAN迭代40000次的结果；(c)是加入重建损失的网络迭代40000次的结果；(d)是文章提出的DiscoGAN迭代40000次后的结果。标准GAN的许多不同颜色的转换点都位于B相同的模式下，海蓝和浅蓝色的点离得很近，橙色和绿色的点也在一起，多种颜色的点（A中的多种模式）都映射到B的同一种模式下。带有重建损失的GAN的模式崩溃问题已经不那么严重了，但是海蓝、绿色和浅蓝色的点仍然会在少数几个模式上重叠。标准的GAN和带有重建损失的GAN都没有覆盖B中的所有模式，DiscoGAN将A中的样本转换为B中有边界不重叠的区域，避免了模式崩溃，并且产生的B样本覆盖了所有10种模式，因此这个映射是双射，从A转换的样本也把B的鉴别器个骗过了。