深度学习中Dropout策略

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/hellonlp/article/details/72649820

在写系统时候遇到一个问题:模仿的版本在测试阶段,把所有权重乘以一个保留概率(做法1)。而GJH他看的Keras里面测试阶段什么都不做(做法2),所以比较好奇,怎么做法不同?


但是随手搜了几个中文博客,都是按做法1来的,训练阶段按保留概率随机生成一个保留矩阵(元素要么0-丢弃,要么1-保留),然后测试阶段所有权重乘以保留概率。


在回头看TensorFlow的Dropout函数https://www.tensorflow.org/api_docs/python/tf/nn/dropout,怎么涉及一个scale操作,懵了:

With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0. The scaling is so that the expected sum is unchanged.


Google了下,找到类似的疑惑提问:

Why input is scaled in tf.nn.dropout in tensorflow?

http://techqa.info/programming/question/34597316/why-input-is-scaled-in-tf.nn.dropout-in-tensorflow


顺藤摸瓜去扫了一眼Dropout原文:http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf,才发现是Dropout的两种策略:

In this paper, we described dropout as a method where we retain units with probability pat training time and scale down the weights by multiplying them by a factor of p at test time.Another way to achieve the same effect is to scale up the retained activations by multiplying by 1/p at training time and not modifying the weights at test time. These methods are equivalent with appropriate scaling of the learning rate and weight initializations at each layer.


简单讲就是:

训练有scale,测试不用管。

训练没scale,测试要乘p。


不过,读了下原文摘要,比那些博客写的好多了... 还是要多看论文啊.....





猜你喜欢

转载自blog.csdn.net/hellonlp/article/details/72649820