- Distillation is a technique used in machine learning to transfer knowledge from a larger, more complex model (known as the "teacher" model) to a smaller, simpler model (known as the "student" model).
- The goal of distillation is to create a condensed model that retains the predictive power of the original model, but with significantly reduced computational cost.
- The process of distillation involves training the teacher model on a large dataset, and then using the teacher model to generate predictions for a smaller dataset.
- The student model is then trained on the same smaller dataset, but instead of being trained to predict the correct output, it is trained to mimic the predictions of the teacher model.
- This process allows the student model to learn from the knowledge of the teacher model, without having to process the entire dataset or perform complex computations.
- The result is a smaller, simpler model that can make accurate predictions with much less computational power than the original model.
- Distillation is particularly useful in situations where computational resources are limited, such as on mobile devices or in embedded systems.
- It is also useful for reducing the memory footprint of a model, which can be important in applications where memory is a limiting factor.
https://arxiv.org/abs/1503.02531