Decoder
A Popular Decoder only architecture is GPT-2, similar to Encoder, the decoder generates a numerical representation of a given word called as a Vector and the dimension of the vector is dependent on the Decoder architecture, where they differ is the Vector generated by the decoder is independent of the word on the right, which is called as Masked Self Attention > SelfAttention and does not benefit from the Bi-Directionalilty that is present in the Encoder, but only has the access to one direction (left or right) and the other direction is masked
Use cases for Decoders:
- Uni-Directional (access to left or right context)
- Great to casual tasks, generating sequences
- NLG: Natural Language Generation
- Example Decoders - GPT-2 GPT Neo
Just like Encoders the Decoder can be used as a standalone model and benefit from the Uni-Directional context awareness which makes them inherently good at generation of content the ability to generate a word or a sequence of words which is called Natural Language Generation or Causal Language Modelling