Module 14: Natural Language Processing Flashcards

1
Q

Which of the following is not a common benefit of using log probabilities instead of probabilities with a Categorical distribution?

A

A log probability representation of a categorical distribution is more efficient to sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

T/F

Markov models are stationary because probabilities assigned to a sequence are invariant with respect to shifts in data [train, validation, test, etc.]

A

False

The stationary distribution of a Markov chain describes the distribution of Xt after a sufficiently long time that the distribution of Xt does not change any longer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cross Entropy:

A

is an upper bound on the entropy. The more accurate a model is, the lower the cross entropy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

T/F
Vector representations of words using term-document matrices and word-word matrices can account for polysemy.

A

False

This vector representation could only compare words that have similar meaning, but it cannot handle the problem that single word has multiple meanings (polysemy).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many tokens would the following sentence be broken into:

ENIAC’s construction was financed by the United States Army, Ordnance Corps, led by Major General Gladeon M. Barnes.

A

24

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Suppose you are building an n-gram model for a corpus with a vocabulary of size V and a total of T tokens. The running time to build this model depends on

A

n and T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the purpose of Laplace smoothing?

A

to account for unseen words.

Without smoothing, the probabilities of these unseen words will be zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F
When generating random text, given some context C, an n-gram model cannot generate a next token T if the model never saw the context-token pair (C, T) during training.

A

True

N-gram models used for generation draw the next word at random (proportional to their conditional probability tables). If a word doesn’t occur with the context in training, then it isn’t included in its conditional probability table and therefore won’t be generated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly