Parsing and Generation Flashcards
Generative Grammar
A formally specified grammar that can generate all and only the acceptable sentences of a natural language.
Constituent
Something that is enclosed in a pair of brackets
Weakly-equivalent grammars
Generate the same strings.
Strongly-equivalent grammars
Assign the same bracketings to all strings they generate.
Context-Free Grammars
Four components:
- A set of non-terminal symbols conventionally written in uppercase.
- A set of terminal symbols, conventially written in lowercase.
- A set of rules, where the left hand side is a single non-terminal and the right hand side is a sequence of one or more non-terminals or terminals.
- a start symbol, congenitally S, which is a member of the set of non-terminal symbols
Empty Productions
Productions with an empty righthand side. Convenient to exclude these as they complicate parsing algorithms and a weakly-equivalent grammar can be constructed that disallows such empty productions.
Left-associative
A grammar in which all nonterminal daughters are the leftmost daughter in a rule.
Right-association
A grammar where all the nonterminals are rightmost is right-associative.
Lexical Ambiguity
Ambiguity arising from dual lexical entries of words e.g. words that can be treated as different types of verbs or as a verb or noun. They can fish is a classic example.
Structural Ambiguity
Ambiguity arising from different possible attachments of phrases.
Parse tree
Structure of sentence in the form of a tree. Equivalent to bracketed structure but easier to read for complex cases.
Chart Parsing
Keeping a record of rules that we’ve applied so we don’t have to back track and redo work that we’ve done before. This works for parsing with CFGs because the rules are independent of context. The data structure used for recording partial results is known as a chart. Such strategies are designed to be complete. See pages 31-35 for more details.
Packing
Changing the daughters value on an edge to be a set of lists of daughters and to make an equality check before adding an edge so we don’t add one that’s equivalent to an existing one.
Why can’t FSAs be used to model natural language syntax?
Centre embedding which is present in natural language is not possible with FSAs however they may only have finite levels of embedding which means FSAs may suffice. However, grammars written using finite state techniques alone are highly redundant and without internal structure, we can’t build up good semantic representations.
Deficiencies in Atomic Category CFGs.
If simple don’t account for subject-verb agreement. We could allow for agreement by increasing number of atomic symbols. This approach doesn’t deal with subcategorisation, the lexical property telling us how many arguments a verb can have.