Phrase-Structure Parsing Flashcards

Question 1

Q

Syntactic analysis: constituents and how to indetify them

Answer

A

A constituent, also called phrase (NP, VP, PP, …), is a group of words that function as a single unit within a hierarchical structure.

The constituent structure of a sentence is identified using constituency tests:

General substitution: replaces the test phrase with some other phrase of the same type
Coordination: one can compose several constituents together using coordination
Topicalization: if a sequence can be moved to a different location without affecting grammaticality is likely to be a consituent

Question 2

Q

Types of constituents

Answer

A

There are several types of constituents, each characterised by the context in which they can occur and their internal structure.

TYPES:

Sentence, abbreviated S, is a constituents representing a complete proposition or clause.
Noun phrase, abbreviated NP
Verb phrase, abbreviated VP
Prepositional phrase, abbreviated PP
Adjective phrase, abbreviated AP

Question 3

Q

Phrase structure, on what it depends

Answer

A

Phrase structures are tree-like representation used to describe a given language’s syntax with:

leaf nodes representing sentence words
internal nodes representing word groupings called phrases

It uses PoS tags (N, V, P, A, Det, …) and Phrase tags (S, NP, VP, PP, AP, …)

Phrase structures depend on:

the linear order of words in the sentence
the groupings of words into constituents (phrases)
the hierarchical relation between constituents

example: “Alice eats strawberries with chocolate”

Question 4

Q

Notions of head, argument and modifier

Answer

A

The head is the word in the phrase that is grammatically the most important.

The head identifies the phrase type: N is the head of an NP, V is the head of a VP, and so forth.

The head selects the arguments and modifiers appearing in the phrase:

Arguments are inherent to the meaning of the phrase; they
appear in fixed number depending on the head’s semantics
Modifiers are optional phrases which merely supplement the head with additional information; they can appear in any number

Question 5

Q

PP-attachment

Answer

A

PP attachment in NLP (Natural Language Processing) refers to the problem of determining the correct attachment of prepositional phrases in a sentence.

write examples at slide 20-23 pdf 8…

Question 6

Q

Wh-movement

Answer

A

In natural language processing (NLP), wh-movement is an important concept for understanding how questions are formed and how information is extracted from text.

Long distance syntactic movement, also called wh movement, can be represented in phrase structure by means of so called traces

write example at slide 24 pdf 8…

Question 7

Q

Treebanks

Answer

A

A phrase structure treebank is a parsed text corpus that annotates the syntactic structure of each sentence, resolving ambiguity.

Treebanks are used to train phrase structure grammars, that is, grammars that can model phrase structures.

The Penn Treebank was the first large-scale treebank. Published in the early 1990s, it revolutionised NLP.

Question 8

Q

Probabilistic CFG

Answer

A

A probabilistic context-free grammar (PCFG) allows to define a probability distribution over the set of generated parse trees.

Suppose a sentence has several parse trees. If rule probabilities are estimated appropriately, we could solve ambiguity by selecting the parse tree with the highest probability.

Question 9

Q

Proper probabilistic CFG

Answer

A

For each rule A -> α, we specify the probability that the rule applies to A to produce α.

A PCFG is proper if for each A we have sum over α P(A -> α) = 1

Question 10

Q

Define the probability of a tree t

Answer

A

P(t) = prod (A->α) (P(A->α))^f(t, A->α)

where f(t, A->α) is the number of times the rule is used in t.

Question 11

Q

Probability of a string w generated by a CFG

Answer

A

Formally, let T(w) be the set of all parse trees of w. Then:

P(w) = sum (t in T(w)) P(t)

where P(t) is the probability of the tree t.

Question 12

Q

Consistent probabilistic CFG

Answer

A

A PCFG is consistent if sum over w P(w) = 1.

Consistency also means sum over t P(t) = 1 for parse trees t generated by the grammar.

Surprisingly enough, a proper PCFG is not always consistent. Probability mass is lost in infinite length parse trees that never produce a finite string.

Question 13

Q

Lexicalised CFGs vs normal CFGs

Answer

A

WHY:

In our previous CFGs, each phrase records the head type but not the lexical content of the head. This makes the model insensitive to lexical selection, resulting in a loss in accuracy.

SO:

Lexicalised CFG includes information about the lexical contents of the words in a sentence.

Phrase-Structure Parsing Flashcards

(13 cards)