Transformers Flashcards
1
Q
Transformers - How is the model training on a translation from English to French task when receiving the full translated text while preserving causality?
A
In the decoder, there is a mask to zero out the contribution of any tokens that surpass the current token to be predicted.