如何实现比PyTorch快6倍的Permute/Transpose算子?

NoSuchKey