POS Tagging Flashcards

1
Q

What is POS tagging?

A

Identifying parts of speech (syntactic categories) in a given string (and audio) for further processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give some examples of other tagging tasks

A

Case restoration, Named Entity Recognition, Information Field Segmentation, Prosodic marking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Case restoration?

A

If some text has been converted all to lower case or upper case. Case restoration is trying to do the reverse. That is for a string like: “this is not a drill. i repeat. this is not a drill”, we would hopefully try to restore: “This is not a drill. I repeat. This is not a drill”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Information Field Segmentation

A

Trying to find words that fit under a certain category (field) within a body of text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is prosodic marking?

A

Determining which words have certain intonations/stress/tones. Eg. “He’s going”, “He’s going!”, “He’s going?” Would all have different intonations that change their meaning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are open-class words?

A

verbs, adjectives, adverbs, nouns, they contain most of the content of a sentence They are constantly changing and new additions are being made all the time (for example: “googling” as a verb)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are closed-class words?

A

pronouns, determiners, prepositions, connectives These are mostly just functional, there is a limited number of these and they act to tie the concept of a sentence together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What type of tags would you expect morphologically rich languages to have?

A

compound morphosyntactic tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a homograph?

A

Two sentences that use the same word(s) X with different parts of speech tags associated with X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What makes POS tagging difficult?

A

Open question. One answer is homographs. Knowledge of words required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we define a probabilistic model for tagging?

A

Say t = a tag, and we begin a sentence with t0 =

If we say that the probability of a following tag depends on the probability of the previous tag then this probability is given by P(ti|ti-1)

If we say that each tag can be realised as one of many different words so each of these words are conditional on the tag, this probability is given by P(w|ti).

To generate sentence of length n:

Let t0=

For i= 1 to n

Choose a tag conditioned on previous tag: P(ti|ti−1)

Choose a word conditioned on its tag:P(wi|ti)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the assumptions the probabilistic model for a tagged sentence of length n makes?

A

Each tag depends only on the previous tag: a bigram model over tags.

Words are conditionally independent given tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What’s a “balanced corpus”?

A

One that has data from different genres and on different topics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly