Skip to content

Conversation

@mrT23
Copy link
Contributor

@mrT23 mrT23 commented Oct 15, 2025

Knowledge distillation is one of the most effective techniques for achieving state-of-the-art results in model pre-training and finetuning. It has become a standard component in nearly every competitive paper and effectively sets the gold standard for top-tier performance.

Additionally, when training with knowledge distillation, the sensitivity to training parameters and the reliance on training tricks is significantly reduced.

This pull request adds simple support for training with knowledge distillation on ImageNet.
The proposed implementation is straightforward, clean, and robust, building on the methodology from https://arxiv.org/abs/2204.03475, and the reference implementation at: https://github.com/Alibaba-MIIL/Solving_ImageNet

@mrT23 mrT23 changed the title Add knowledge distillation model and loss function support Feature: Add knowledge distillation support Oct 15, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@mrT23
Copy link
Contributor Author

mrT23 commented Oct 21, 2025

merged #2598 to this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants