Grammars Flashcards
Use of CFGs in NLP
- CFGs tend to be effective for hierarchical strucutres of language
- Probabilistic extensions(PCFGs) capture the likeliness of structures
- CFGs usually define the basis of syntactic parsing
Syntactic parsing
- The text analysis that determines the syntactic structure of a sentence
- Used in NLP as preprocessing for many tasks, e.g., relation extraction
Constituencies vs Dependencies parsing
Constituency parsing: Infers the structure of the phrases in a sentence
Dependency parsing: infers the structure of the words’ dependencies
Transformation into normal form
- Cleaning: Empties and unaries are removed recursively
- Binarization: n-ary rules are divided by using new non-terminals, n>2
- Any CFG can be tranformed into CNF without changing the language
- This may result in different parse trees for words the language
Constituency parsing
- The text analysis that determines the phrase structure of a sentence with respect to a given grammar
- Often used in NLP as preprocessing where syntax is important
- Parsing works robust across domains of well-formatted texts.
Downstream tasks(tareas posteriores) based on parsing
Named entity recognition in complex domains
Relationship extraction: both for semantic and temporal relations
Coreference resolution: to identify candidate matching references
Opinion mining regarding aspects of products or similar
Machine translation, to analyze the source sentence
Quiestion answering, particularly in high-precision scenarios
Attachment ambiguity
Key parsing problem
Correct attachment of the various constituents in a sentence, such as prepositional phrases, adverbial phrases, infinitives, ….
How to find the correct attachment?
Potential attachments grow exponentially with number n of constituents
Limitations of standard PCFGs
-PCFGs assume that the plausibility of structures is independent of the words used, i.e., each rule has a fixed probability
-However, specific words may make certain rules particularly (un)likely
Dependency Grammar
- A grammar that odels the syntactic structure of a sentence by linking its tokens with binary asymmetric relations
- Relations, called dependencies, define grammatical connections
Graph representation of dependency grammar
-Each node is a token
- An edge connects a head with a dependent node
- The nodes and edges form a fully connected, acyclic tree with a single head(if available, the main verb of the first main clause is the head)
Identification of Dependencies:
Selected features of dependencies
- Breaks: Dependencies rarely span intervening verbs or punctuation
- Valency: heads have usual numbers of dependents on each side
- Affinities: Some dependencies are more plausible than others(For example: “Issues -> the” rather than “the -> issues”
Parsing Methods
Dynamic programming
Graph algorithmus
Transition-based parsing