Distillation
Distillation is a process of Large Language Model (LLM) optimization in which we train a Student Model from a Larger Teacher Model, in this manner we can get the quality of the larger model while having a smaller model size, this was proven when DistilBERT was trained from BERT where DistilBERT was able to retain over 97% accuracy of the larger model while having only 40% of the model parameters. 1
This can be done by providing the Student Model with the prompts and the responses as the training data, through which the student can imitate the Teacher Model
Drawbacks
- The student is limited by the teacher - The student model imitates the teacher model, models that are generalized with specialized tasks are not really good for production ready tasks
- Limited by LLM options - Most LLM providers ban other companies from using their models for training purposes.
- Data Size - We would require a lot of labelled and unlabelled data to make a good model with great accuracy