Constituency Parsing Flashcards

1
Q

Constituency Parsing

Constituency Parsing

A
  • Parsing based on syntax, structure of the sentence
  • Tree-structured: syntactic analysis of sentences (focus on structure, POS)
  • Common Things: noun phrases, verb phrases, prepositional phrases
  • Doesn’t work for all languages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Constituency Parsing

Constituency Parsing: Challenges

A
  1. PP (Prepositional Phrase) attachment
  2. Modifier Scope
  3. Complement Structure
  4. Coordination Scope
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Constituency Parsing: Challenges

PP (Prepositional Phrase) Attachment

A

Different parsings of a sentence can be incorrect/produce wrong meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Constituency Parsing: Challenges

Modifier Scope

A

Modifier describes an object.
Ex: “plastic cup holder” (does plastic describe the cup or the holder?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Constituency Parsing: Challenges

Complement Structure

A

Complementary tokens can refer to different objects.
Ex: “The students complained to the professor that they didn’t understand.” (“they” can refer to the professor or the students)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Constituency Parsing: Challenges

Coordination Scope

A

Some tokens can be classified using multiple labels
Ex: “The man picked up the hammer and saw.” (“saw” can be a verb or a noun)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Context-Free Grammars

Context-Free Grammars (CFG)

A
  • Symbols that can be rewritten as one or more symbols
  • Represented as a tuple
  • Can have ambiguity (multiple results possible)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Context-Free Grammars

CFG Lexicon

A

Consists of “preterminals” (POS tags) rewriting as terminals (words)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Context-Free Grammars

CFG Representation

A

Written as a tuple: (N, T, S, R)
N = nonterminals
T = terminals
S = start symbol
R = rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

PCFGs

Probabalistic Context-Free Grammars

A
  • Probabilites associated with rewrites
  • Normalize by source symbol
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PCFGs

PCFG Equation

A
P(T) = product_rET (r|parent(r))

Tree T is the series of rule applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

PCFGs

Learning a PCFG

A
  • Given a set of example trees, the underlying CFG can simply be all production rules seen in the corpus
  • Learning Goal: estimate the probability of each production rule
  • Maximum Liklihood Estimation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

PCFGs

PCFG Maximum Likelihood Estimation

A
P(a->b) = count(a->b) / count(a)

(similar to HMMs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

PCFGs

Chomsky Normal Form (CNF)

A
  • To parse efficiently, we binarize PCFGs by always using Chomsky Normal Form
  • Means that trees can only have either two non-terminal subtree children or one (terminal) leaf value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Inference

CKY

A
  • Find argmax_T( P(T) )
  • Dynamic Programming: chart maintains the best way of building symbol X over span(i, j)
  • Like Viterbi
  • Basically… continously split text and assign label until reach terminal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Parser Evaluation

A

Precision: number of correct brakets / num predicated brackets
Recall: number of correct brackets / num of gold brackets
F1: harmonic mean of precision and recall

17
Q

Dependency VS Constituency

A

Dependency: more univeral, models PP attachment better, easier to build, faster
Constituency: doesn’t work for all languages, models coordination better, harder to build (needs to learn grammar and then use it), slower