Distillation

Distillation is a process of Large Language Model (LLM) optimization in which we train a Student Model from a Larger Teacher Model, in this manner we can get the quality of the larger model while having a smaller model size, this was proven when DistilBERT was trained from BERT where DistilBERT was able to retain over 97% accuracy of the larger model while having only 40% of the model parameters. ¹

This can be done by providing the Student Model with the prompts and the responses as the training data, through which the student can imitate the Teacher Model

Drawbacks

The student is limited by the teacher - The student model imitates the teacher model, models that are generalized with specialized tasks are not really good for production ready tasks
Limited by LLM options - Most LLM providers ban other companies from using their models for training purposes.
Data Size - We would require a lot of labelled and unlabelled data to make a good model with great accuracy

LLM distillation demystified a complete guide ↩

tags	ai/largelanguagemodel
source	https://snorkel.ai/blog/llm-distillation-demystified-a-complete-guide

Quartz 5

Explorer

Distillation

Distillation

Drawbacks

Graph View

Table of Contents

Backlinks

Quartz 5

Explorer

Distillation

Distillation

Drawbacks

Footnotes

Graph View

Table of Contents

Backlinks