Encoder

Content From HuggingFace Encoders Video

A Very great example of Encoder only Architecture is BERT.

An encoder converts the given input text into a Numerical Representation which can be called as a Vector or a Feature Tensor.

The output would contain one vector per word and the dimension of the vector is defined by the architecture of the model for the base BERT model the vector length is 768, the final vector is a contextual representation of the word that has been encoded and the words around it, it is possible due to Attention > SelfAttention

Use Cases for Encoders:

  • Bi-Directional: context from the left and context from the right
  • good at extracting meaningful information
  • sequence classification, question answering, Masked Language Learning
  • NLU: Natural Language Understanding
  • Example Models: BERT RoBERTa ALBERT