Machine Translation and Encoder-Decoder Models Flashcards
What is machine translation?
It is the use of computers to translate one language to another
What machine translation models exist?
Statistical phrase alignment models, Encoder-Decoder models and Transformer models
What type of task is machine translation?
It is a sequence to sequence task (seq2seq)
What is the input, output and their lengths for a seq2seq task?
The input X is a sequence of words, the output Y is a sequence of words, but the length of X may not necessarily equal the length of Y
Besides machine translation, what are some other seq2seq tasks?
Question → Answer
Sentence → Clause
Document → Abstract
What do universal aspects mean in regards to the human language?
These are aspects that are true, or statistically mostly true for all languages
What are some examples of universal aspects in the human language?
Nouns/Verbs, Greetings, Politeness/Rude
What are translation divergences?
These are areas where languages differ
What are some examples of translation divergences?
Idiosyncrasies and lexical differences
Systematic differences
What is the study of translation divergences called?
Linguistic Typology
What is Word Order Typology?
It is a way of ordering words in different ways, these can include:
- Subject-Verb-Object (SVO)
- Subject-Object-Verb (SOV)
- Verb-Subject-Object (VSO)
What is the Encoder-Decoder model?
For an input sequence, we have an encoder, that encodes the input to a context vector, which is then sent to a decoder that generates the output sequence.
What can an encoder be?
LSTM, GRU, CNN, Transformers
What is a context vector?
It is the last hidden layer of the encoder, which is used as the input to the decoder
What does a language model try to do?
Predict the next word in a sequence Y based on the previous word
How is a translation model different to a language model?
It predicts the next word in the sequence Y based on the previous target word AND the full source sequence X
Explain how the encoder-decoder model shown in the image works
We have a single hidden layer that takes as an input the embeddings of the source text, we then have a separator and the predicted words based on its training. The predicted words are used in the prediction of the next word until the end is reached. The key is that the final hidden layer of the last input word is fed into the decoder which predicts the target words