Dependency Parsing Flashcards

1
Q

What is a dependency grammar?

A

They are structures that are based on grammatical relations that have been defined by linguists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of word order to dependency grammars have?

A

They have a free word order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do the edges and nodes represent in a dependency grammar?

A

The labels for edges are grammatical relations and nodes are words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do grammatical relations connect?

A

They connect head and dependent words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the type of a grammatical relation called?

A

Its grammatical function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of graph is a dependency tree?

A

It is a directed graph as the head node can only have arc coming into it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a dependency tree made up of?

A

Vertices = nodes = words or sometimes stems/affixes

Arcs = Grammatical function relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How many incoming arcs does the root node have?

A

None

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How many arcs does each vertice have coming into it?

A

Just one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How many vertices can the root node reach through some path?

A

The root node has a path to every vertice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does it mean if an arc is projective?

A

We say an arc is projective if for a head and all of its dependents, there is a path to connect them all directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In the image, is the arc connecting flight and was projective? Why?

A

The arc is not projective, because even though we can go from ‘flight’ to ‘was’, and from ‘was’ to ‘which’, there is no path to get from ‘flight’ to either ‘this’ or ‘morning’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In the image below, is the arc from ‘was’ to ‘late’ projective?

A

It is projective, as we can go from ‘was’ to ‘late’, and from ‘late’ we can go to ‘already’ directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do older parsing algorithms assume?

A

They assume that trees are always projective. English based Treebanks are guaranteed to be projective, but in other languages this is not the case and graphs can often include non-projective trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What Treebanks are used that include dependency graphs?

A

The Penn Treebank and the Ontonotes dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can we move from Parse-Structures (Constituency Grammars) to Dependency Structures?

A

We can identify head-dependent structures using head rules, and then we can connect children to heads using a dependency relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are some problems that arise from translating parse-structures to dependency structures?

A

We cannot represent non-projective structures (as there are no examples in phrase-structured treebanks)

There is a lack of structure in flat noun phrases

18
Q

When would you prefer a dependency graph over a constituent based structure?

A

For training relational extraction models (clause argument detection) as clauses need subject and object grammar

19
Q

What is transition based dependency parsing?

A

These are based on the classic shift-reduce parsing where the basic idea is that we have a CFG, a stack and a list of words. We encode the words in the input buffer, shift the words onto the stack, match the words on the stack against the grammar, replacing the pairs of words that match with the left hand side of the CFG.

E.g. We have a determiner and nominal on the stack and noun phrases are made up of det + nom, we replace them with a NP

20
Q

What is the Arc Standard approach?

A

It is a simple and effective transition based dependency parsing approach

21
Q

What is an Oracle?

A

it is a model that provides the correct transition operator (add left arc, add right arc, do a shift, finished) given a configuration state when doing the Arc Standard approach to dependency parsing

22
Q

What happens when we perform a left arc?

A

We connect the 1st word of the stack (head) with the 2nd word (dep)

23
Q

What happens when we perform a right arc?

A

We connect the 2nd word of the stack (dep) with the 1st word (head)

24
Q

What happens when we perform a shift?

A

We move the next word in the input buffer into the stack

25
Q

Explain what is happening in the image

A

It shows the Arc Standard approach to dependency parsing. We start with the root node, check the oracle and shift the first word onto the stack, we check the oracle and shift the second word onto the stack. The oracle then says to perform a right arc from ‘book’ to ‘me’. We the consult the oracle again, it says to shift 3 times. We then perform a left arc having consulted the Oracle, which connects ‘flight’ to ‘morning’. This continues until we reach the point where the Oracle states that we are done.

26
Q

When we perform a left or right arc, which word is removed from the stack?

A

The word that the relation is going to is removed from the stack, the word that the relation starts from remains on the stack.

27
Q

What are some problems with the arc standard approach?

A

It is a greedy algorithm - other parse trees that are viable may be ignored

It assumes that the Oracle is 100 percent perfect which is not true in practice

The action set is needed for each relation, meaning that large action space needs to be processed

28
Q

What are some alternatives to the Arc Standard approach?

A

The Arc Eager approach and the Beam Search

29
Q

How is the Oracle created?

A

It is trained on treebanks using a supervised ML approach. We compile training examples of configurations and transition pairs. Configurations are a stack, word list and existing relation set. A transition is the action type. Feature templates are used for this model

30
Q

What features can be used when creating an Oracle?

A

Lemmas, POS, wordforms, word embeddings, dependency relations

31
Q

What is an issue with picking lots of features for Oracle training?

A

It can slow the machine learning training down

32
Q

When we have an appropriate feature template for an Oracle, what do we do next?

A

We apply it to the training input to create the inputs for the ML algorithm. X is the value set for the feature types. Y is the actions that we take for the the same input from the reference parse tree. We then apply an ML algorithm such as LR, SVM or deep learning models

33
Q

What is Graph-Based dependency parsing?

A

it is a different way of doing dependency parsing where we encode possible trees as directed graphs. They allow for the production of non-projective graphs and can handle long distance dependencies by scoring entire trees

34
Q

What is the edge-factored approach?

A

It is a graph based approach to dependency parsing. We have a score which we want to maximise, which is the sum of the score for all its edges. We have an input sentence, and a number of possible parse trees. We want to find the best possible parse tree by taking argmax of all possible parse trees for that sentence.

35
Q

What type of tree do we need to perform edge-factored dependency parsing?

A

They must be a maximum spanning tree, where every vertex (node) has one incoming edge

36
Q

What type of selection do we use with edge-factored DP and what else do we perform?

A

We use greedy selection to choose the most probable edge weight, and we perform cleanup to avoid cycles by adjusting weights using heuristics

37
Q

Explain what the image shows.

A

It shows edge factored dependency parsing. We start at the root, and take the most probable edge, which is 12, then we do this again from the node we have reached, which is book, and choose the most probable edge again, which is 7 to take us to flight. Then from flight we go to that. Although we could go to flight, we will create a cycle, therefore a cleanup needs to be performed.

38
Q

How is cleanup performed?

A

It is performed by subtracting the max edge weight (incoming) from all edge weights (incoming)

We then collapse nodes in a cycle into a single node

This process is then repeated recursively

39
Q

Explain the following image

A

This image shows how cleanup is performed in edge-factored dependency parsing. Here, we look at each node and its incoming edges and subtract the maximum edge from all of the other incoming edges. We then observe to see if there are any cycles, which in the top right graph, there are. We therefore collapse that node into a single node, called tf and adjust the edges such that we have a valid graph. This is repeated for as long as we have cycles

40
Q

How do we train a supervised ML approach to parse the edge factored graphs?

A

We apply a feature template to the training examples to produce X. We then produce a Y which is the edge probability or predictions. We have the ground truths and use a standard supervised learning algorithm as we did for the transition based approach (Arc Factored)

41
Q

What does Dozat’s model do for GB-DP?

A

It uses a LSTM model using features from (word, POS and character) embeddings. The character embeddings help handle rare words that are not in the training vocabulary

42
Q

How is Dependency Parsing evaluated?

A

Exact Match (is it perfect? conservative and minor errors cause failure)

Labelled Attachment Score (TP = correct assignment of word → head + dep relation)

Unlablled Attachment Score (TP = correct assignment of word → head)

Label Accuracy Score (percentage of words with correct edge label)