Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

NoSuchKey