Causal Language Modelling
We pass a word to the decoder as an input and the decoder outputs a vector that would contain a numeric representation of the next word, we then would apply a transformation called a Language Modelling Head to the output to identify the next word from the all the words known to the decoder to determine a word which is again passed to the input to determine the next output, this behavior is called Auto Regressive behavior where the model uses it’s past outputs as the next input. The Decoder will repeat the process for a few times to generate a output and the output length is determined by the Context of the model, for GPT-2 the context window is 1024 so the model would generate an output and still have some memory while having the entire input in memory
Guide
https://huggingface.co/docs/transformers/tasks/language_modeling#causal-language-modeling