Syntactic Parsing Flashcards
What is syntactic parsing?
It is the process of assigning a syntactic structure to a sentence
What are two parsing structures?
Constituency structures and Dependency structures
What is the biggest challenge to syntactic parsing?
Ambiguity - specifically structural ambiguity
What is structural ambiguity?
It is where multiple parse trees are possible in a grammar for the same sentence
What is attachment ambiguity?
It is where a constituent could be attached to multiple places in a parse tree
What is coordination ambiguity?
It is where phrases can be conjoined in multiple ways
What is syntactic disambiguation?
It is choosing the correct parse
What is useful to address ambiguity?
Dynamic Programming
What algorithm is a classic dynamic programming approach to parsing?
The CKY algorithm
What is dynamic programming the same as?
Chart Parsing
What does CKY require grammars to be?
In Chomsky Normal Form (CNF)
What are the rules of Chomsky Normal Form?
The right side must be (i) two non-terminal nodes, or (ii) a single terminal node
How do we encode a parse tree in CNF?
We use a 2D matrix called a parse table. Indices before and after tokens are called fenceposts
What does each cell represent in a parse table?
An entry for i,j
i is the start fencepost index for span
j is the end fencepost index for span
n is the length of the sentence
span (i,j) is constituent phrase with j - i tokens
span (0,n) is the sentence
In a parse table, in what order do we move?
You move left to right, bottom to top
With the aid of the image, explain how you perform CKY parsing manually.
You start at the first position, and make your way from left to right looking from bottom to top. You look to see if there are any productions that end in any of the values in the cell you are at or below, and have the values that are observed to the left before it. For example, it the cell [0,3] we can produce a Verb (seen in cell [0,1]) followed by a NP (seen in cell [1,3]) by using a production from S, VP or X2.
Does the CKY algorithm help to disambiguate possible parse trees?
No - the table is populated with all possible parse trees, it does not choose the best
How does neural CKY work?
It learns to classify constituency parsing labels for text spans. Given a particular span, it looks the model can be trained such that it gives the correct constituency parse label.
What is the input for a Neural CKY model?
We take our words, tokenize them to get the words, pass it into the word piece tokenizer to get the subwords and pass these to the BERT embedding layer to encode the words. We then use either the first or last subword embedding for each word to get back to words
How are the word embeddings used in a Neural CKY model?
A post processing layer that is a deep learning stack (often a transformer) with a MLP is passed the word embeddings which are used to convert this into a sequence classifier to map them to a label.
How can embeddings per word be used to represent a span?
Directional vectors where the embedding at the start is subtracted from the end, and vice versa to get a forward and backward vector, which are then concatenated in order to get the fenceposts, although the backward vector uses the values + 1 to account for the fact that the fencepost occurs somewhere between the words.
When we have the span vectors for Neural CKY, what do we then do?
We then compute a score using an activation function (e.g. ReLU), the MLP output layer has dimensions equal to the number of non-terminal labels
What does the activation function of the MLP produce in Neural CKY?
A distribution across all the possible labels stating which label is the most likely one
How do we compute a score for the entire parse tree?
We sum the individual scores for all the possible spans and all possible labels to give a score for each possible parse tree.
How is the best parse tree chosen?
The argmax function
How frequently does argmax find a complete tree in practice?
95 percent of the time a complete tree is found