【论文笔记】Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

NoSuchKey