Transformers Flashcards

1
Q

Transformers - How is the model training on a translation from English to French task when receiving the full translated text while preserving causality?

A

In the decoder, there is a mask to zero out the contribution of any tokens that surpass the current token to be predicted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly