Paper address: https://arxiv.org/abs/2006.07733
Open source code: https://github.com/deepmind/deepmind-research/tree/master/byol
MoCo, SimCLR, and CPC are all comparative learning methods.
BYOL can achieve 74.3% top-1 classification accuracy on ImageNet without negative samples. BYOL uses two neural networks, the online network and the targets network. The parameters of the online network are set to θ, which consists of three parts, the encoder fθ, projectorgθ (Linear+BN+ReLU+Linear) and projectorqθ; the target network has the same architecture as the online network, but has different parameters ξ. The target network provides a regression target to train the online network, but the parameter ξ of the target network is updated using the EMA (exponential moving average) formula (similar to MoCo's momentum update formula). The formula is .
Algorithm comparison
- MLP in SimCLR with BN after each linear layer
- MLP in MoCo V2 without using BN
- MLP in BYOL with BN only after the first linear layer
refer to:
Comparative Learning Series (4) --- BYOL_Tao Jiang's Blog-CSDN Blog_byol Comparative Learning
Self-supervised model---BYOL_Mu Yangziyu's blog-CSDN blog_byol model