Week 6 - Sequence Labelling Flashcards

1
Q

What is the task of sequence labelling defined as

A

the ask of
-assigning a label yi to each token xi in an input token sequence X
-the output sequence Y has the same length as X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 8 main word classes for POS tagging

A

Nouns
pronouns
verbs
adjectives
adverbs

determiners
conjunctions
prepositions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are determiners

A

a, the, an
used to specify nouns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are prepositions

A

in, of, from
denote spatial information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are word classes defined base on:

A

(1) Their grammatical relationship with neighbouring words
eg I went for a walk/ I will walk to work

(2) morphological properties (eg of suffixes)
dance (VB), danced (VBD), dancing (VBG)
ie VBD - past tense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are POS tags broadly categorised into

A

closed class v open class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a closed class

A

members are fixed; unlikely that new words are added

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is an open class

A

new words likely to be added/coined over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

POS: Task, Input, Output

A

Task: assign a POS tag to each word in a sequence
Input: sequence x1,…,xn of words and a tagset
Output: sequence y1,..,yn of tags where each tag corresponds to input token
eg Janet/NOUN will/AUX …bill/NOUN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is POS tagging difficult

A

words are syntactically ambiguous
“back” -> noun, adjective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we measure POS accuracy

A

proportion of POS tags that match gold standard POS tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Semantic Role Labelling (SRL)

A

identifying predicate-argument structures
Answers the question:
“who did what to whom where and when?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is predicate (SRL)

A

word(s) expressing the event
ie the what

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is argument (SRL)

A

the participants in the events
ie the who, whom, where, when

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the semantic role (SRL)

A

the role that each argument (of a predicate) takes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Task of SRL

A
  • automatically find the semnatic role of each argument of each predicate (every argument and predicate)
  • in principle: the predicate is pre-identified in the input
  • in practice: most SRL models detect the predicate
17
Q

What is the Propositional Bank scheme (SRL)

A

roles are specified to a verb and named with numbers
“the waiter spilled the soup”
ARG0: agent (initiator of an action) “the waitor”
ARG1: patient (entity undergoing the effect of an action) “the soup”
ARG2 and so on: depends on the verb/action

18
Q

What is the FrameNet scheme (SRL)

A

Frame: a set of related concepts that together comprise background knowledge on some events

19
Q

what are the main roles that exist in the framenet scheme

A

item, attribute, initial_value, final_value, difference
not every sentence will have all of these elements

20
Q

What is Named Entity Recognition (NER)

A

named entity: anything that can be referred using a proper name (Person, organisation, location, geo-political entity)
But can also include expressions like dates, times, prices

21
Q

Why is NER hard

A

ambiguous
Washington - name, organisation, location adn GPE

22
Q

How can NER be applead to sequence labelling

A

Individual tokens are assigned named entity tags
BIO - beginning, inside, outside
IO - inside, outside
BIOES - beginning, inside, outside, end, single
single means consists of only one token

23
Q

What are Conditional Random Fields (CRFs)

A

Model that discriminates among all possible tag sequences
Y^ = argmax P(Y|X)
It assigns a probability to an entire sequence Y for every possible sequence in y, given the input sequence X
(selects the highest probability)

24
Q

What is a global feature (CRFs)

A

Fk
Property of the entire sequences X and Y, which is a sum of local features at each position i in Y

25
Q

What is a local feature (CRFs)

A

fk
makes use of current output token y, previous output token yi-1, any part of the input sequence X and current position i

26
Q

What is K and wk in CRFs

A

K = number of features
wk = feature weight

27
Q

What is Z(X) in CRFs

A

The normalisation factor: to make sure all probabilities equal to 1

28
Q

What is word shape features (CRF)

A

Abstract letter pattern of a given word
all lowercase letter mapped to ‘x’, all uppercase to ‘X’
all digits mapped to ‘d’, all punctuation retained

29
Q

What is short word shape features (CRF)

A

like word shape but consecutive character types removed
eg token = I.M.F
word shape = X.X.X, short word shape: X.X.X
eg2 token = DC10-30
word shape = XXdd-dd, short word shape: Xd-d

30
Q

What is affixes feature (CRFs)

A

prefixes and/or suffixes of size 1 to n

31
Q

What is gazetteer feature (CRFs)

A

presence of a word in a dictionary of entities (of interest)

32
Q

How are features binarised

A

every feature eg NNP pos tag is turned into a binary feature
so can be 0 or 1 depending on the token

33
Q

How is BERT finetuned for sequence labelling

A

On top of BERT, add a classifier (eg single feedforward layer)
-takes as input the output vector for each token
- produces a softmax distribution over all tags
- label with highest probability chosen as output

34
Q

Why is BERT a local approach

A

Does not take into account dependencies between tags
eg B-PER followed by B-PER
(should be very uncommon to happen)

35
Q

What is ## in BERT

A

represents a subword token
Crashes - > Crash, ##es

36
Q

How to we solve BERT as a local approach

A

Add a CRF layer on top of the classifier
- take the softmax output from the classifier
- pass it on to the CRF layer, which takes a global approach: takes into account the label of the previous token