Phrase-Structure Parsing Flashcards

1
Q

Syntactic analysis: constituents and how to indetify them

A

A constituent, also called phrase (NP, VP, PP, …), is a group of words that function as a single unit within a hierarchical structure.

The constituent structure of a sentence is identified using constituency tests:

  • General substitution: replaces the test phrase with some other phrase of the same type
  • Coordination: one can compose several constituents together using coordination
  • Topicalization: if a sequence can be moved to a different location without affecting grammaticality is likely to be a consituent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of constituents

A

There are several types of constituents, each characterised by the context in which they can occur and their internal structure.

TYPES:

  • Sentence, abbreviated S, is a constituents representing a complete proposition or clause.
  • Noun phrase, abbreviated NP
  • Verb phrase, abbreviated VP
  • Prepositional phrase, abbreviated PP
  • Adjective phrase, abbreviated AP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Phrase structure, on what it depends

A

Phrase structures are tree-like representation used to describe a given language’s syntax with:

  • leaf nodes representing sentence words
  • internal nodes representing word groupings called phrases

It uses PoS tags (N, V, P, A, Det, …) and Phrase tags (S, NP, VP, PP, AP, …)

Phrase structures depend on:

  • the linear order of words in the sentence
  • the groupings of words into constituents (phrases)
  • the hierarchical relation between constituents

example: “Alice eats strawberries with chocolate”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Notions of head, argument and modifier

A

The head is the word in the phrase that is grammatically the most important.

The head identifies the phrase type: N is the head of an NP, V is the head of a VP, and so forth.

The head selects the arguments and modifiers appearing in the phrase:

  • Arguments are inherent to the meaning of the phrase; they
    appear in fixed number depending on the head’s semantics
  • Modifiers are optional phrases which merely supplement the head with additional information; they can appear in any number
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

PP-attachment

A

PP attachment in NLP (Natural Language Processing) refers to the problem of determining the correct attachment of prepositional phrases in a sentence.

write examples at slide 20-23 pdf 8…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Wh-movement

A

In natural language processing (NLP), wh-movement is an important concept for understanding how questions are formed and how information is extracted from text.

Long distance syntactic movement, also called wh movement, can be represented in phrase structure by means of so called traces

write example at slide 24 pdf 8…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Treebanks

A

A phrase structure treebank is a parsed text corpus that annotates the syntactic structure of each sentence, resolving ambiguity.

Treebanks are used to train phrase structure grammars, that is, grammars that can model phrase structures.

The Penn Treebank was the first large-scale treebank. Published in the early 1990s, it revolutionised NLP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Probabilistic CFG

A

A probabilistic context-free grammar (PCFG) allows to define a probability distribution over the set of generated parse trees.

Suppose a sentence has several parse trees. If rule probabilities are estimated appropriately, we could solve ambiguity by selecting the parse tree with the highest probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Proper probabilistic CFG

A

For each rule A -> α, we specify the probability that the rule applies to A to produce α.

A PCFG is proper if for each A we have sum over α P(A -> α) = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define the probability of a tree t

A

P(t) = prod (A->α) (P(A->α))^f(t, A->α)

where f(t, A->α) is the number of times the rule is used in t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Probability of a string w generated by a CFG

A

Formally, let T(w) be the set of all parse trees of w. Then:

P(w) = sum (t in T(w)) P(t)

where P(t) is the probability of the tree t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Consistent probabilistic CFG

A

A PCFG is consistent if sum over w P(w) = 1.

Consistency also means sum over t P(t) = 1 for parse trees t generated by the grammar.

  • Surprisingly enough, a proper PCFG is not always consistent. Probability mass is lost in infinite length parse trees that never produce a finite string.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lexicalised CFGs vs normal CFGs

A

WHY:

In our previous CFGs, each phrase records the head type but not the lexical content of the head. This makes the model insensitive to lexical selection, resulting in a loss in accuracy.

SO:

Lexicalised CFG includes information about the lexical contents of the words in a sentence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly