lecture 10 Flashcards
predicate
usually the verb or verb phrase that expresses the action or state
(in dictionary form)
thematic role: agent
volitional causer of an event
thematic role: experiencer
experiencer of an event
thematic role: force
non-volitional causer of the event
thematic role: theme
participant most directly affected by the event
thematic role: result
the end product of an event
thematic role: content
the proposition or content of a propositional event
thematic role: instrument
an instrument used in an event
thematic role: beneficiary
the beneficiary of an event
thematic role: source
the origin of the object of a transfer event
thematic role: goal
the destination of an object or transfer event
idiom
expressions whose meanings are not predictable from the meanings of their individual words
- noncompositional
- means they usually cannot be translated word-for-word into another language
- highlights how literal translations fail to capture the intended meaning, emphasizing the importance of understanding cultural and contextual nuances for accurate idiomatic translation
IBM models 1-5
- series of word-based statistical models that are induced from parallel data (alignment probability distributions)
- data-driven
- laid groundwork for modern statistical machine translation
phrase-based statistical machine translation (SMT)
- unlike word based models that translate words in isolation, phrase-based SMT considers contiguous sequences of words/phrases
- improved translation significantly over earlier word-based models
- handle phrases and idioms better, capture linguistic context better
neural machine translation
- quickly becomes state-of-the-art
- relies on deep learning models, specifically neural networks, to perform translations
- encoder-decoder architecture
central problem of machine translation
language divergence: structural differences in word order between languages
why is machine translation difficult
- ambiguity
–> same word can have multiple meanings
–> same meaning can be described by multiple word(forms) - word order
–> underlying deeper syntactic structure
–> computationally intensive - morphological richness
–> Identifying basic units of words (morphemes)
correspondences
- one-to-one: simple sentence translation maintaining word order and meaning
- one-to-many (and reordering): single words in one language may require multiple words in another, and may need reordeing
- many-to-one (and elision): multiple words in one language combine to form a single word in another
- many-to-many: entire phrases or idiomatic expressions may need to be translated into completely different phrases in another language
lexical divergences: lexical specificity
a word in one language has multiple specific translations in another language
–> brother = gege (older) or didi (younger)
lexical divergences: homonyms and polysemous words
the different senses of homonymous words generally have different translations
–> (river) bank = ufer
–> (money) bank = bank
the different senses of polysemous word may also have different translations
–> i know that he bought the book, i know peter, i know math
–> sais qu, connais, m’y connais en
lexical divergences: morphological differences
different languages exhibit varied inflections and morpheme structures
–> new = nouveau/nouvelle
lexical divergences
- homonymous words
- polysemous words
- lexical specificity
- morphological divergences
syntactic divergences
- word order
- head-marking vs dependent-marking
- pro-drop languages
- negation
syntactic divergences: word order
- word order can be fixed or free
- languages with a fixed word order have sentences that follow a specific structure (e.g., SVO)
syntactic divergences: head-marking vs dependent-marking
- head-marking languages: grammatical relationships are indicated on the head of a phrase
–> the man house-his - dependent-marking languages
–> the man’s house
syntactic divergences: pro-drop languages
- these languages can omit pronouns
–> e.g., spanish: i eat = como
syntactic divergences: negation
negation operates differently across languages
semantic differences
- aspect
- motion events
semantic differences: aspect
conveying current actions
- progressive aspect: swimming
- expression with an adverb: schwimmt gerade
semantic differences: motion events
have two properties
1. manner of motion (swimming)
- direction of motion (across the lake)
languages either express the manner with a verb and the direction with a ‘satellite’ or vice versa
why model translation with a probabilistic model
- we would like to have a measure of confidence for the translations we learn
- we would like to model uncertainty in translation
model
a simplified and idealized understanding of a physical process
translation explained with the Noisy Channel Model
- general framework for many NLP problems
- generate target sentence
- a channel corrupts the target
- source sentence is a corrpution of the target sentence
–> translation is then the process of recovering the original signal (e) given the corrputed signal (f)
–> P(e|f) = p(e) * P(f|e)
why use the noisy channel model
- makes it easier to mathematically represent translation and learn probabilities
- fidelity (accuracy of content) and fluency (naturalness of language) can be modeled separately
word alignment
task to learn sentence translation probabilities where we first need to learn word-level translation probabilities
- start with parallel sentence pair
–> a sentence in one language paired with its translation in another language - since there are multiple possible alignments, we try to find multiple sentence pairs
–> multiple possible word alignments - key idea: look at the co-occurrence of translated words. words that occur together in the parallel sentence are likely to be translations
- calculate P(f|e)
–> probability of a word in language 1 (f) given another word (e)
problem with word alignment
we can only find the best alignment if we know the word translation probabilities
–> this is a chicken and egg problem
solution to word alignment problem
iterative process: Expectation-Maximization (EM) algorithm
- estimate alignment probabilities using word translation probabilities
- re-estimate word translation probabilities
- since we dont know the best alignment initially, we consider all possible alignments when estimating the word translation probabilities and weigh all these with their corresponding alignment probabilities
- computed as the ratio of the expected number of times the pair (f, e) occurs to the expected number of times any word pairs with e.
phrase-based SMT
use phrases (sequence of words) as the basic translation unit
–> Instead of aligning single words between the source and target languages, we align entire phrases.
benefits of phrase-based SMT
-
local reordering: B-SMT allows intra-phrase re-ordering, meaning that within a single phrase or sequence of words, the order can be adjusted and memorized to better match the target language’s structure
–> the ordering of words is adapted to fit the syntactic rules of the other language - sense disambiguation: PB-SMT uses the context provided by neighboring words within a phrase to disambiguate meaning.
- handling institutionalized expressions: idioms can be learned as a single unit
- improved fluency: incorporating entire phrases, which can be of any length, enhances the natural flow of translations.
learning the phrase translation model
- learn the phrase table (central data structure in PB-SMT)
- learn the phrase translation probabilities
SMT pipeline
- word alignment
- phrase extraction + distortion modelling + feature extraction + language modeling
- tuning
- decoder
getting word order right in PB-SMT
preprocessing the input by changing the order of words in the input sentence to match the order of the words in the target language
- parse the sentence to understand its syntactic structure
- apply rules to transform the tree
addressing rich morphology
- break the words into its component morphemes
- learn translations for the morphemes
transliteration
handling names and OOVs (out of vocabulary words)
evaluation of MT output
with respect to
1. adequacy: how good the output is in terms of preserving content of the source text
- fluency: how good the output is as a well-formed target language
types:
1. human evaluation
2. automatic evaluation
3. BLEU: compares ngrams between languages
4. TER: TER: measures number of edits required
5. METEOR
precision
words in candidate that are in ref / #words in candidate
(with repetition)
modified precision
words in candidate that are in ref / #words in candidate
clip the number of matching words to their max count in the reference sentence
recall
cannot be used for PB-SMT
greedy decoding
selects the word with the highest probability
–> risks running into local optima
sampling decoding
randomly selecting the next word based on the probability distribution
–> introduces randomness, potentially capturing more diverse translations but at the risk of inconsitencies