Key points are not available for this paper at this time.
Knowledge distillation is effective to train small and generalisable network models for meeting the low-memory and fast running requirements. Existing offline distillation methods rely on a strong pre-trained teacher, which enables favourable knowledge discovery and transfer but requires a complex two-phase training procedure. Online counterparts address this limitation at the price of lacking a highcapacity teacher. In this work, we present an On-the-fly Native Ensemble (ONE) strategy for one-stage online distillation. Specifically, ONE trains only a single multi-branch network while simultaneously establishing a strong teacher on-the- fly to enhance the learning of target network. Extensive evaluations show that ONE improves the generalisation performance a variety of deep neural networks more significantly than alternative methods on four image classification dataset: CIFAR10, CIFAR100, SVHN, and ImageNet, whilst having the computational efficiency advantages.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xu et al. (Tue,) studied this question.
www.synapsesocial.com/papers/6a08ebf71b91a3b1ea5b72ee — DOI: https://doi.org/10.48550/arxiv.1806.04606
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Lan Xu
Xiatian Zhu
Shaogang Gong
Building similarity graph...
Analyzing shared references across papers
Loading...