Exam Preparation Deck Flashcards
Define the eight features used for pronoun resolution. State the extraction method if the feature is hard to get.
- Cataphoric: If pronoun occurs earlier than candidate antecedent.
- Number agreement: If pronoun and candidate antecedent agree on the number. The numbers of expressions can be found by a morphological processor.
- Gender agreement: If genders are compatible. This may require a named entity classifier.
- Same verb: If the pair shares the same verb. Can be determined by a syntactic parser.
- Sentence distance.
- Grammatical role of the antecedent. Subject/object/other. Can be found by a syntactic parser.
- Parallel: If the pair shares the same grammatical role.
- Form: Proper/indefinite/definite/pronoun. Form of the antecedant.
What is a baseline?
What is a ceiling?
A baseline is a score given by a relatively simple approach which is used as a standard against which the approach under investigation is compared.
Ceiling is the maximum performance that could be expected, generally the agreement achieved between two or more humans performing the task.
Why might a discourse model be used over a Naive Bayes model in resolving pronouns?
May not produce a globally consistent answer.
Quite likely that the classifier would propose that ‘he’ and ‘it’ refer to Burns in the example given.
In discourse model, fixed binding gives information on the bound pronoun, thus giving global consistency. The model also allows ‘repeated mention’ heuristic, something impossible for a single pass classifier.
Define morphological ambiguity. Give an example.
Words that can be decomposed into different sets of morphemes.
For example, unionised can be seen as un-ion-ise-ed, or union-ise-ed.
Define lexical ambiguity. Give an example.
Arises when a word has multiple senses.
For example, the word ‘duck’ could be an action or an animal.
Define syntactic/structural ambiguity. Give an example.
Multiple ways of bracketing an expression.
He ate the pizza with a fork.
The prepositional phrase ‘with a fork’ can be bind to ‘he’ or ‘the pizza’.
Define discourse relation ambiguity. Give an example.
Implicit relationship between sentences.
Max fell. John pushed him.
- Narration: Max fell and John pushed him.
- Explanation: Max fell because John pushed him.
Describe the packing algorithm. What is it good for?
Packing is an optimization on the chart parsing. We record multiple derivations of a possible phrase in the same edge.
Works because rule application is not sensitive to the internal structure of an edge.
It can be proven that the algorithm can run in cubic time. It stops entries in the list from growing exponentially. However, unpacking takes exponential time…
Given a string of words, how do we compute the most likely tags?
Define:
- Hyponymy
- Meronymy
- Synonymy
- Antonymy
- Hyponymy: More specific meaning of a general term. Dog is a hyponym of animal.
- Meronymy: Part-of relation. Arm is a meronym of body.
- Synonymy: Same meaning. Policeman and cops are synonyms.
- Antonymy: Opposite meaning. Big and little are antonyms.
Describe Yarowsky’s minimally-supervised learning approach to word sense disambiguation.
- Find examples in the corpus.
- Manually identify seeds to disambiguate some uses.
- Train a decision list classifier on Sense A/B examples. Rank features by log-likelihood ratio.
- Apply classifier to training set. Add reliable examples.
- Iterate 3,4 until convergence.
What can we do to avoid P(w_n | w_{n-1}) bigrams being zero?
- Smoothing: distribute ‘extra’ probability between rare and unseen events.
- Backoff: approximate unseen probabilities by a more general probability, e.g. unigrams
Define four notions of context.
- Word windows (not filtered): n words on either side of the lexical item.
- Word windows (filtered): n words on either side, removing functional words and very frequent content words.
- Lexeme windows: Use stems instead of words.
- Dependencies: Directed links between head and dependents. Context of item is the dependency structure it belongs to.
Define three ways to weigh context.
- Binary model: Set value of dimension c to 1 if context c co-occurs with word w.
- Basic frequency model: Count number of times c co-occurs with word w instead.
- Point-wise mutual information (PMI). Slightly borrowing stuff from Information Theory.
How do we combine visual and text words?
- Feature level fusion: Concatenate text and visual vectors. Reduce dimension by SVD or NMF.
- Scoring level fusion. Estimate similarity separately for text and visual vectors. Take a mean of the scores.