Week 4 - Chunking Flashcards
Chunking
Finding groups of words that have specific rules, instead of just single words.
e.g. “Dave gave Mary a book” - Verb phrase
Noun Phrase (NP) or nominal (phrase)
a phrase (sequence of words) that has a noun (at its head) and performs the same grammatical function as that noun
Verb Phrase (VP)
A syntactic unit composed of at least one verb and its dependents
Context Free Grammars
Terminals
- Takes these to be words
Non-Terminals
- The constituents in a language, Like NP, VP or sentence
Rules
Equations that consist of a single non-terminal on the left and any number of terminals and non-terminals on the right
A -> a B c | d c
Noun Phrases (NP) determiners
NPs can start with determiners
They can be:
- Simple lexical items
A car
- Simple possessives
John’s car - complex recursive versions of that
John’s sister’s husband’s son’s car
Noun Phrases (NP) nominals
Contains the head and any pre- or post- modifiers of the head
Pre-modifiers:
Quantifiers, cardinals, ordinals…
Three cars
Adjectives
large cars
Ordering pre-modifiers?
three large cars
large three cars
Post modifiers:
propositional phrases (e.g. from Seattle)
Non-finite clauses (e.g. arriving before noon)
Relative clauses (e.g. that server breakfast)
There are many many rules for this, it is very complicated
Treebanks
corpora in which each sentence has been paired with the (presumably correct) syntax tree.
Instead of paying linguists to write a grammar, pay them to annotate real sentences with parse trees. Use that annotated data to learn the rules
Penn treebank is a widely used treebank
Probabilistic CFGs
Probabilistic context free grammar
Each production (rule) has a probability
From a treebank, get the probabilities for all rules and their lexicalised variants