Encoder Decoder
An example of a popular Encoder Decoder architecture is the T5, to better understand review the Encoder and Decoder.
In this Architecture we are passing the output sequence of the Encoder directly to the Decoder, the Decoder along with the Encoder outputs is given a initial sequence using which it will try to decode the sequence, the Encoder input is used as the Start of the Sequence if there is no initial input given to the Decoder once this done, we no longer need the Encoder as the decoder can proceed in a AutoRegressive Format. In this manner we can continue on and on to generate an output until the decoder generates an output which would be considered as a stopping sequence like a . that would mark the end of a sequence
In the Encoder Decoder architecture both the Encoder and the Decoder do not share any weights, here we would use the Encoder to understand the input and make sense of it and generate a dense Vector which could be sent to the Decoder to generate an output, this decoder could be trained in a completely different domain or a completely different modality like voice or image
Use Cases for Encoder-Decoder:
- Sequence to Sequence; many-to-many: translation and summarization
- weights are not shared between the Encoder and the Decoder
- input distribution is different from the output distribution
we could also load a encoder and a decoder in a encoder and decoder model which allow us to use the best model for the task in their domains.