Topic 3: Part of Speech Tagging Flashcards

1
Q

What is part of speech?

A

word classes/syntactic categories reveals alot of word and its neighbours

example:
noun is likely to be preceeded by determiners and adjective
ver is likely to be proceeded by noun

syntactic structure..nouns are part of noun phrases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Application of POS

A
  1. parsing
  2. labelling named entities in IR
  3. Co-reference resolution..example
  4. speech recognition or synthesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 categories of POS

A

closed class and open class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Closed class

A

relatively fix membership, preposition or new preposition are rarely coined

generally function words
occur frequently, short and often have structuring uses in grammer
E.g: of, it, and or you

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Open Class

A

noun and verbs..can have new word..
new noun : iPhone
new verb: to fax
are continually being created or borrowed

In English: there are 4 (nouns, verbs, adjectives, adverbs)
many languages have the 4 but not all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Open Class: Noun

A
Give example
Proper noun (Penang samsung, IBM, Intel)
Common noun (cat, pencil
    count noun ( cat, cats)  - can be enumerated grammatically
    mass noun (salt, snow)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Open Class: Verb

A

have inflections (a change of the form of word)
non-third-person-sg (eat), third-person-sg (eats)
progressive (eating), past participle (eaten)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Open Class: Adjective

A

terms for properties and qualities

e.g concepts of color (blue, yellow), age, value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Open Class: Adverb

A
view as modifying something
directional/locative (here, downhill)
degree (very, extremely)
manner (slowly, delicately)
temporal (yesterday, monday)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Closed class

A
list and examples
prepositions
particles
determiners
pronouncs
conjunctions
auxiliary verbs
numerals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Closed class: Prepositions

A

occur before noun phrases. semantically often indicate spatial preposition or temporal relations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Closed class: Particles

A

resembles a preposition or adverb but used in combination with verb for extended meanings
eg over : she turned the paper over

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Closed class: Phrasal verb

A
verb and particle that act as single syntactic and/or semantic unit
example:
turn down
find out
go on
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Closed class: Determiners

A
the closed class that occurs with nouns, marking the beginning of noun phrase
eg a, an, the, this, that
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Closed class: Conjunction

A

word that join 2 phrases, clauses or sentences

eg: and, or but, that

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Closed class: Pronoun

A

forms that often act as a kind of shorthand for referring to the same noun phrase or entity or event
eg: you, she, i, it, me, (personal pronoun)

my, your, his, her, its, one’s, our, their (possessive pronoun)

17
Q

Penn Treebank Tagset

A

45-tag Penn Treebank tagset used to label many corpora

in labelling, POS generally represented by placing the tag after each word delimited by a slash

18
Q

Choosing tagset

A

Coarse tagsets:
8 parts of speech which is noun, verb, pronoun, preposition, adverb, conjunction, participle, article

Finer grained tagset:
45 tag “ Penn Treebank Tagset”

More finer grained tagset
87-tag tagset from Brown tagset

19
Q

Tagged corpora

A

corpora labelled with POS tag. cruicial for training statistical tagging algorithm

created by running automated POS tagger on texts then human annotators hand-corrected the tags

words are generally tokenized before tagging

Three main tagged corpora commonly used
Brown corpus
WSJ corpus
Switchboard corpus

20
Q

Role of tokenization in tagging

A

Treebank tagset assumes tokenization of multipart words done at whitespace process..seperated by whitespaces

21
Q

recap tagging what is POS tagging and the model

A

process of assigning POS marker to each word in input text

input: sequence o tokenized word and tagset
function: tagging algorithm
output: sequence of tags, one per token

22
Q

Challenges in tagging and give example

A
  • words are ambiguous
    tagging is a task to disambiguate

ambiguity - one of more ambiguous possible POS. goal is to find the correct tag

For example
a verb (book that flight)
a noun (hand me that book)
23
Q

High ambiguous word

A

that, back, down, put and set..

example
there are 6 different POS for "back"
JJ
NN
VBP
VB
RP
RB
24
Q

Method for POS tagging

A

rule based…tag based on hand-written disambiguation rules

probabilistic/stochastic tagger
resolve tagging ambiguities by training corpus.
compute probability of given word having given tag in a given context
HMM tagger

25
Q

Simple baseline algorithm

A

idea of more likely POS

for example “ a” can be a determiner or the letter “a”.
but it is more likely a determiner.

simplistic baseline algo for POS choose tag that most frequent in training corpus given the ambiguous word.

26
Q

POS Accuracy

A

standard performance measure
percentage of tags correctly labelled matching human labels test set.

always compare classifier against baseline at least as good as the most frequent class baseline. MFCB

27
Q

Rule based POS tagger

A

assign list of potential POS tag to each word based on dictionary

manual rules for out of vocab words

apply handwritten constrains until each word has only one possible POS

example

  1. DT cannot immediately precede with a verb
  2. no verb can immediately precede a tensed verb
  3. eliminate VBN if VBD is an option
28
Q

A probabilistic method for POS

A

consider all possibile sequences of classes

chose tag sequences which is most probable given the observation sequence of n words

29
Q

Estimating probability

A

word likelihood prob * tag transition prob