Chen Danqi's team proposed MeZO, a low-memory and efficient zero-order optimizer, and a single-card A100 can train 30 billion parameter models

NoSuchKey

Guess you like

Origin blog.csdn.net/qq_27590277/article/details/130960015