[mPLUG]: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections多模态特征融合方法泛读

NoSuchKey