Chen Danqi's team proposed MeZO, a low-memory and efficient zero-order optimizer, and a single-card A100 can train 30 billion parameter models
NoSuchKey
Guess you like
Origin blog.csdn.net/qq_27590277/article/details/130960015
Recommended
Ranking