CHUNKING Flashcards
What is the purpose of Chunking
Moving from individual words to meaningful groups “chunks”
What is a noun phrase NP
(or nominal phrase)
performs the same grammatical function as a noun
“my favourite book”
“a big blue whale”
What is a verb phrase VP
a syntactic unit composed of at
least one verb and its dependents
“read a good novel”
“we will visit the park tomorrow”
“she is writing a letter’
What is a Determiner
often a noun phrase can start with one. They can be:
- Simple lexical items: the, this, a, an…
- Simple possessives: John’s
- Complex recursive versions: John’s sister’s husbund’s
What is a Nominal
Contains the head and any pre and post modifiers of the head. Follows the determiner
VP/NP in context-free grammars
terminals : words
non-terminals : constituents like VP NP or sentence
What are pre-modifiers
- Quanitifiers, cardinals, ordinals: eg three cars
- Adjectives: eg large cars
- There is an ordering: three large cars (not large three)
‘Cars’ here is the head
What is the head in a NP
Main component that carries the primary meaning of the phrase
Usually a noun, pronoun, or word functioning as a noun
What are post-modifiers
- Prepositional phrases: eg from Seattle
- Non-finite clauses: eg arriving before noon
- Relative clauses: eg that serve breakfast
- Nominal → Nominal PP
Nominal → Nominal GerundVP
Nominal → Nominal RelClause
What is Agreement
constraints that hold among various
constituents that take part in a rule or set of rules
eg determiners and the head nouns in NPs have to agree in their number (Not “This flights”)
A way to deal with this is to add further rules in CFGs for NPs to deal with this:
- SingularNP → SingularDet SingularNom
- PluralNP → PluralDet PluralNom
What are the constituents of VPs
English VPs consist of a head verb along with 0 or more following constituents which we call arguments
What is subcategorisation
We can subcategorise verbs according to the sets of VP rules they can take part in
We have to formally express these constraints
However for CFGs, this does not scale well, explodes the number of rules
What are Treebanks
Corpora in which each sentence has been paired with a parse tree
Instead of paying linguists to write a grammar, pay them to annotate real sentences with parse trees
Then use the annotated data to learn the rules
What are probabilistic CFGs
Where each production rule has a probability
Using MLE:
We use the treebank to count how many times a rule A -> a is enacted
divide by the number of times a rule with LHS A is enacted
to get P(a|A)
Each rule A → a is assigned a probability p(a|A)
the sum over all expansions of A must equal 1
Probabilistic CFGs for Chunking
Chunking can now be done probabilistically using these rewriting rules