Hoorcollege 9 Translating ||: Neural methods Flashcards
Attention
Attention gives weights to each context word before putting into vector
* Fixed content without attention puts all the words of context in one vector
RNN encoder
- Often bidirectional, which can be contextualize bettter
Transfomers:
* More efficient computing T(x1, x,) in one step via attention
Translating with transformers
Encoder uses transformer block and decoder more powerful blocks with an extra encoder-decoder attention layer which is undirectional via MASK
* Row of self-attention is normalized by softmax
Neural generation
Needs as many output vector D’s as words and highest score gets looked up
Out of vocabulary
Approaches:
* <unk>
* copying, because it is probably a name
* Subword segmentation, such as BPE</unk>
BPE (Byte pair encoding)
We try to find subwords using BPE and translate them
BPE finds most common letters co-occurring in a corpus and then treats those as 1 item
and goes on
OPUS: massive open sources parallel (meaning in multiple languages for 1 text) texts
Sentence aligment
finding correspondence between source sentences and their
equivalent translations in the target text
Backtranslation
If there is a small corpus for language x to y then train a model on it and
expand corpus with y to x translations and train it again
- You can align word vector spaces of the two languages and translate it word by word and
use thee translations to train the model as well
MT evaluation
- precision: % of ngrams of candidate translation in the reference &
- recall= % of ngrams of reference are in candidate translation
chrFB = (1 + B^2) * ((chrP * chrR) / (B^2 * chrP + chrR))
calculates the P of word sequences