POS tagging Flashcards

Question 1

Q

What is POS

Answer

A

Part of speech
a linguistic category of words, which is generally
defined by their syntactic or morphological behaviour
POS explains not what the word is about, but how it is used

Question 2

Q

What are open word classes

Answer

A

Constantly acquire new members
verb, noun, adverb, adjective, interjection
e.g. nouns: Internet, blog, Covid,
§ e.g. verbs: to google, to tweet, to self-isolate

Question 3

Q

Two classes for all words in POS tagging

Answer

A

open word classes
closed word classes

Question 4

Q

What are closed word classes

Answer

A

Generally do not acquire new members
pronouns, prepositions, conjunctions
e.g. prepositions: to, from, in
§ e.g. pronouns: I, you, he/she/it, we, you, they

Question 5

Q

What are content(lexical) words

Answer

A

words that carry the content or the meaning of a sentence
open class words

Question 6

Q

What are function words

Answer

A

have little lexical meaning, but instead serve to
express grammatical relationships
such as articles (the) or conjunctions (and) can be found in almost any utterance, no matter what it is about
Usually not inflected

Question 7

Q

What is the main issue with POS tagging

Answer

A

same word form can have different POS tags depending on the context: walk as a verb or a noun
The main task of POS tagging is to resolve the lexical ambiguities given the context

Question 8

Q

Internal Cues for POS tagging

Answer

A

Morphology is used for unknown words
extract-ed -> ed is common for verbs

Question 9

Q

External Cues for POS tagging

Answer

A

Using context
I will book a ticket

In this context book is a verb as will is usually follow by a verb

Question 10

Q

What are rule-based taggers

Answer

A

Uses a large set of rules
1. Starts with a dictionary
2. Assign all possible tags to words from dict
3. Apply rules to selectively remove tags leaving the correct tag for each word

Question 11

Q

What are stochastic taggers

Answer

A

resolve ambiguities by using a training dataset
to estimate the probability of given word having
a given POS in a given context

Considers all possible sequences of tags
Chooses the most probable given the sequence
Requires a trained POS tagged corpus

Want the highest P(t1…tn | w1…wn)

Question 12

Q

Stochastic tagging formula

Answer

A

argmax f(x) means the x for which f(x) is maximised

argmax P(POS|word) = argmaxP(word|POS) x P(POS)

Question 13

Q

What is the transition probability

Answer

A

what is the probability of tag ti following tag ti-1 (like n-gram does for words)
Eg tags: DT, NN
P(DT |NN) = C(NN,DT) / C(NN)
We are calculating the probability of seeing the DT tag after the NN tag

Question 14

Q

What is the emission probability

Answer

A

given a tag ti, how likely is it that the corresponding word is wi
P(is | VBZ) = C(VBZ, is) / C(VBZ)

C(VBZ, is): The count of occurrences where the pos tag is VBZ and the word is ‘is’

Question 15

Q

What are transformation taggers

Answer

A

shares features from both rule based and stochastic tagging
rules are automatically induced from a
previously tagged training dataset

Question 16

Q

Steps of transformation taggers

Answer

Study These Flashcards

A

start with simple solution to the problem

iteration: apply transformations to get best results, e.g. by correcting errors (from the simple solution)

stop when no more (or little) improvement can be made

Rules for transformation are automatically induced from the training data - learning in the training phase

Question 17

Q

What is the Brill tagger

Answer

Study These Flashcards

A

at the start, each word is tagged with its most likely tag
but errors will appear
learn transformations (rules) that correct errors from tagged data
Once rules are learnt:
-Apply the intial tagging
-Apply re-write rules based on the training data

The system can easily be re-trained with new data
(error-driven)

Question 18

Q

How do we evaluate POS tagging

Answer

Study These Flashcards

A

Accuracy: how many tags we got right
Overall error rate
- on particular tags
- on particular words
- tag confusions

POS tagging Flashcards

(18 cards)