Exam Preparation Deck Flashcards

1
Q

Define the eight features used for pronoun resolution. State the extraction method if the feature is hard to get.

A
  1. Cataphoric: If pronoun occurs earlier than candidate antecedent.
  2. Number agreement: If pronoun and candidate antecedent agree on the number. The numbers of expressions can be found by a morphological processor.
  3. Gender agreement: If genders are compatible. This may require a named entity classifier.
  4. Same verb: If the pair shares the same verb. Can be determined by a syntactic parser.
  5. Sentence distance.
  6. Grammatical role of the antecedent. Subject/object/other. Can be found by a syntactic parser.
  7. Parallel: If the pair shares the same grammatical role.
  8. Form: Proper/indefinite/definite/pronoun. Form of the antecedant.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a baseline?

What is a ceiling?

A

A baseline is a score given by a relatively simple approach which is used as a standard against which the approach under investigation is compared.

Ceiling is the maximum performance that could be expected, generally the agreement achieved between two or more humans performing the task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why might a discourse model be used over a Naive Bayes model in resolving pronouns?

A

May not produce a globally consistent answer.

Quite likely that the classifier would propose that ‘he’ and ‘it’ refer to Burns in the example given.

In discourse model, fixed binding gives information on the bound pronoun, thus giving global consistency. The model also allows ‘repeated mention’ heuristic, something impossible for a single pass classifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define morphological ambiguity. Give an example.

A

Words that can be decomposed into different sets of morphemes.

For example, unionised can be seen as un-ion-ise-ed, or union-ise-ed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define lexical ambiguity. Give an example.

A

Arises when a word has multiple senses.

For example, the word ‘duck’ could be an action or an animal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define syntactic/structural ambiguity. Give an example.

A

Multiple ways of bracketing an expression.

He ate the pizza with a fork.

The prepositional phrase ‘with a fork’ can be bind to ‘he’ or ‘the pizza’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define discourse relation ambiguity. Give an example.

A

Implicit relationship between sentences.

Max fell. John pushed him.

  • Narration: Max fell and John pushed him.
  • Explanation: Max fell because John pushed him.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the packing algorithm. What is it good for?

A

Packing is an optimization on the chart parsing. We record multiple derivations of a possible phrase in the same edge.

Works because rule application is not sensitive to the internal structure of an edge.

It can be proven that the algorithm can run in cubic time. It stops entries in the list from growing exponentially. However, unpacking takes exponential time…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Given a string of words, how do we compute the most likely tags?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define:

  1. Hyponymy
  2. Meronymy
  3. Synonymy
  4. Antonymy
A
  1. Hyponymy: More specific meaning of a general term. Dog is a hyponym of animal.
  2. Meronymy: Part-of relation. Arm is a meronym of body.
  3. Synonymy: Same meaning. Policeman and cops are synonyms.
  4. Antonymy: Opposite meaning. Big and little are antonyms.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe Yarowsky’s minimally-supervised learning approach to word sense disambiguation.

A
  1. Find examples in the corpus.
  2. Manually identify seeds to disambiguate some uses.
  3. Train a decision list classifier on Sense A/B examples. Rank features by log-likelihood ratio.
  4. Apply classifier to training set. Add reliable examples.
  5. Iterate 3,4 until convergence.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What can we do to avoid P(w_n | w_{n-1}) bigrams being zero?

A
  1. Smoothing: distribute ‘extra’ probability between rare and unseen events.
  2. Backoff: approximate unseen probabilities by a more general probability, e.g. unigrams
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define four notions of context.

A
  1. Word windows (not filtered): n words on either side of the lexical item.
  2. Word windows (filtered): n words on either side, removing functional words and very frequent content words.
  3. Lexeme windows: Use stems instead of words.
  4. Dependencies: Directed links between head and dependents. Context of item is the dependency structure it belongs to.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define three ways to weigh context.

A
  1. Binary model: Set value of dimension c to 1 if context c co-occurs with word w.
  2. Basic frequency model: Count number of times c co-occurs with word w instead.
  3. Point-wise mutual information (PMI). Slightly borrowing stuff from Information Theory.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we combine visual and text words?

A
  1. Feature level fusion: Concatenate text and visual vectors. Reduce dimension by SVD or NMF.
  2. Scoring level fusion. Estimate similarity separately for text and visual vectors. Take a mean of the scores.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you learn adjective matrices from corpus data?

A
17
Q

Define the seven major tasks in content generation.

A
  1. Content determination: Decide what content to convey.
  2. Discourse structuring: Structuring of text… e.g. abstract, introduction, conclusion…
  3. Aggregation: How information split into sentence-sized chunks.
  4. Referring expression generation: Deciding when to use pronouns, how many modifiers…
  5. Lexical choice: What lexical items to use to convey a concept.
  6. Surface realization: Map semantic representation to a string.
  7. Fluency ranking: Rank the strings overgenerated by a big grammar. May use n-grams.
18
Q

What is a chart? What is it useful for?

A
  • Data structure for parsing natural language.
  • Allow DP by storing partial results of parsing. Hence avoiding recomputation.
  • Chart consists of a list of edges, each storing one result of a rule application.
  • Each edge has five fields: id, left vertex, right vertex, category and daughters.
19
Q

Distinguish inflectional and derivational morphology.

A
  • Inflectional: Concerns properties such as tense, aspect, number, person, gender.
  • Derivational: (un-, re-, anti-), broad range of semantic possibilities…