1、目的
借助无监督预训练来提升半监督学习的效果
2、方法
1)unsupervised/self-supervised pretrain
-> task-agnostic
-> big (deep and wide) neural network可以有效提升准确性
-> improvements upon SimCLR
larger ResNet models;deeper but less wide
deeper(3-layer) non-linear network(projection head)
incorporate the memory mechanism from MoCo
2)supervised fine-tuning
fine-tune from the first layer of the MLP head
3)self-training / knowledge distillation using unlabeled data
-> no real labels are used
-> 当labeled example数据量较大时,可以结合进loss计算中
-> encourage the student network to mimic the teacher network's label predictions
-> fix teacher network, train (smaller) student network
-> 大模型先self-distillation,再向小模型做knowledge distillation
3、结论
1)半监督学习的可用label越少,越能从大模型中获益
2)用于具体任务时,大模型不是必要的,因此可以transfer到小模型上
3)用一个较深的projection head,可以提升半监督的结果