POS Tagging Flashcards

Question 1

Q

What is POS tagging?

Answer

A

Identifying parts of speech (syntactic categories) in a given string (and audio) for further processing.

Question 2

Q

Give some examples of other tagging tasks

Answer

A

Case restoration, Named Entity Recognition, Information Field Segmentation, Prosodic marking

Question 3

Q

What is Case restoration?

Answer

A

If some text has been converted all to lower case or upper case. Case restoration is trying to do the reverse. That is for a string like: “this is not a drill. i repeat. this is not a drill”, we would hopefully try to restore: “This is not a drill. I repeat. This is not a drill”

Question 4

Q

What is Information Field Segmentation

Answer

A

Trying to find words that fit under a certain category (field) within a body of text

Question 5

Q

What is prosodic marking?

Answer

A

Determining which words have certain intonations/stress/tones. Eg. “He’s going”, “He’s going!”, “He’s going?” Would all have different intonations that change their meaning.

Question 6

Q

What are open-class words?

Answer

A

verbs, adjectives, adverbs, nouns, they contain most of the content of a sentence They are constantly changing and new additions are being made all the time (for example: “googling” as a verb)

Question 7

Q

what are closed-class words?

Answer

A

pronouns, determiners, prepositions, connectives These are mostly just functional, there is a limited number of these and they act to tie the concept of a sentence together

Question 8

Q

What type of tags would you expect morphologically rich languages to have?

Answer

A

compound morphosyntactic tags

Question 9

Q

What is a homograph?

Answer

A

Two sentences that use the same word(s) X with different parts of speech tags associated with X

Question 10

Q

What makes POS tagging difficult?

Answer

A

Open question. One answer is homographs. Knowledge of words required.

Question 11

Q

How do we define a probabilistic model for tagging?

Answer

A

Say t = a tag, and we begin a sentence with t₀ =

If we say that the probability of a following tag depends on the probability of the previous tag then this probability is given by P(t_i|t_i-1)

If we say that each tag can be realised as one of many different words so each of these words are conditional on the tag, this probability is given by P(w|t_i).

To generate sentence of length n:

Let t₀=

For i= 1 to n

Choose a tag conditioned on previous tag: P(t_i|t_i−1)

Choose a word conditioned on its tag:P(w_i|t_i)

Question 12

Q

What are the assumptions the probabilistic model for a tagged sentence of length n makes?

Answer

A

Each tag depends only on the previous tag: a bigram model over tags.

Words are conditionally independent given tags

Question 13

Q

What’s a “balanced corpus”?

Answer

A

One that has data from different genres and on different topics

POS Tagging Flashcards

(13 cards)