[Large-scale training] Tensor model parallelism in transformers

NoSuchKey

Guess you like

Origin my.oschina.net/u/5682856/blog/5555783