Machine Translation Flashcards
Machine Translation
Automatically translating from one language to another
Why is machine translation difficult?
- Word order differences
- Vocabulary gaps
- Metaphor, idioms, collocations
- Cultural difference
Vaquois triangle
Depth of linguistic analysis vs. distance to be covered by the translation
Furthest to closest:
- Word to word
- Syntax to syntax
- Semantic to semantic
- Interlingua
How to build a translation model using a parallel corpus with word alignments
Learn the co-occurrence probabilities
Compute conditional probability as a relative frequency
P (witch | bruja)
= count (witch + bruja) / count(witch)
How to build a translation model using a parallel corpus with translational probabilities
Compute the alignments that maximize the overall probabilities
How to build a translation model from a parallel text using expectation maximization
Start by setting all translational probabilities to be uniform.
Compute word alignment probabilities from them.
Now recompute translation probabilities based on these word alignments.
Then recompute word alignments based on the new translational probabilities.
Repeat until convergence.
Statistical approach to machine translation: two questions
Which words and phrases in the source language translate to which words and phrases in the target language?
How do I best phrase things (in a natural way) in the target language?
Machine translation
Decoder: Given F, find most likely E
Language model: P(E)
Translation model: P(F | E)
P(E | F) = argmax P(F | E) * P(E)
(drop P(F) from Bayes rule because it is always the same)