Prediction and Part-Of-Speech Tagging Flashcards

1
Q

Corpus

A

A body of text that has been collected for some purpose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Balanced Corpus

A

Contains texts which represent different genres.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Prediction

A

Given a sequence of words, we want to determine what’s most likely to come next.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

N-gram Model

A

A type of Markove Chain where the sequence of the prior n -1 words i sused to predict the next.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Trigram

A

Use preceding two words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bigram

A

Models the preceding word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Unigram

A

Use no context at all.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bigrams Model

A

Assigns a probabilitiy to a word based on the previous word alone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Viterbi Algorithm

A

A dynamic programming technique for efficiently applying n-grams in speech recognition and other applications to find the highest probability sequence. It is usually descibed in terms of an FSA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Smoothing

A

To allow for sparse data - we use smoothing. This means that we make some assumption about the probability of unseen or very infrequently seen events and distribute that probability appropriately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Add-one Smoothing

A

Add one to all counts - not sound theoretically but simple to implement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Backoff

A

Backing off to lower n-gram probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Part of Speech Tagging

A

Associating words in a corpus with a tag indication some syntactic information that applies to that particular use of the word. POS tagging makes it easier to extract some types of information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Stochastic POS-tagging

A

Too complex to make flash card from look at pages 22-24 in the notes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Evaluation of POS tagging

A

POS tagging algorithms are evaluated in terms of percentage of correct tags. Success rates of 95% are misleading as baseline of choosing most common tag based on training set gives 90% accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Training data and test data

A

The assumption in NLP is that a system should work on novel data. The test data must therefore be kept unseen.

17
Q

Baselines

A

Report evaluation with respect to a baseline, which is normally what could be achieved with a very basic approach, given the same training data.

18
Q

Ceiling

A

Ceiling for performance of an application. This is usually taken to be human performance on that task where ceiling is percentage agreement found between two annotators.

19
Q

Error Analysis

A

Error rate on a particular program will be distributed very unevenly. Some errors may also be more important than others e.g. treating an incoming order as junk is much worse than converse.

20
Q

Reproducibility

A

Evaluation should be done on a generally available corpus so that other researches can replicate the experiments.