版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/bea_tree/article/details/72861507
Everyone is an adaptive machine.
Life experiences are the training data.
We should learn from them with several loss functions.
A good learning rate method emphasizes the directions rather than a higher learning rate.
“In practice networks that use Batch Normalization are significantly more robust to bad initialization. Additionally, batch normalization can be interpreted as doing preprocessing at every layer of the network, but integrated into the network itself in a differentiable manner. Neat!”