Decoder

A Popular Decoder only architecture is GPT-2, similar to Encoder, the decoder generates a numerical representation of a given word called as a Vector and the dimension of the vector is dependent on the Decoder architecture, where they differ is the Vector generated by the decoder is independent of the word on the right, which is called as Masked Self Attention > SelfAttention and does not benefit from the Bi-Directionalilty that is present in the Encoder, but only has the access to one direction (left or right) and the other direction is masked

Use cases for Decoders:

Just like Encoders the Decoder can be used as a standalone model and benefit from the Uni-Directional context awareness which makes them inherently good at generation of content the ability to generate a word or a sequence of words which is called Natural Language Generation or Causal Language Modelling