Distillation

Distillation is a process of Large Language Model (LLM) optimization in which we train a Student Model from a Larger Teacher Model, in this manner we can get the quality of the larger model while having a smaller model size, this was proven when DistilBERT was trained from BERT where DistilBERT was able to retain over 97% accuracy of the larger model while having only 40% of the model parameters. 1

This can be done by providing the Student Model with the prompts and the responses as the training data, through which the student can imitate the Teacher Model

Drawbacks

  • The student is limited by the teacher - The student model imitates the teacher model, models that are generalized with specialized tasks are not really good for production ready tasks
  • Limited by LLM options - Most LLM providers ban other companies from using their models for training purposes.
  • Data Size - We would require a lot of labelled and unlabelled data to make a good model with great accuracy

Footnotes

  1. LLM distillation demystified a complete guide