Module 14: Natural Language Processing Flashcards
Which of the following is not a common benefit of using log probabilities instead of probabilities with a Categorical distribution?
A log probability representation of a categorical distribution is more efficient to sample.
T/F
Markov models are stationary because probabilities assigned to a sequence are invariant with respect to shifts in data [train, validation, test, etc.]
False
The stationary distribution of a Markov chain describes the distribution of Xt after a sufficiently long time that the distribution of Xt does not change any longer.
Cross Entropy:
is an upper bound on the entropy. The more accurate a model is, the lower the cross entropy.
T/F
Vector representations of words using term-document matrices and word-word matrices can account for polysemy.
False
This vector representation could only compare words that have similar meaning, but it cannot handle the problem that single word has multiple meanings (polysemy).
How many tokens would the following sentence be broken into:
ENIAC’s construction was financed by the United States Army, Ordnance Corps, led by Major General Gladeon M. Barnes.
24
Suppose you are building an n-gram model for a corpus with a vocabulary of size V and a total of T tokens. The running time to build this model depends on
n and T
What is the purpose of Laplace smoothing?
to account for unseen words.
Without smoothing, the probabilities of these unseen words will be zero.
T/F
When generating random text, given some context C, an n-gram model cannot generate a next token T if the model never saw the context-token pair (C, T) during training.
True
N-gram models used for generation draw the next word at random (proportional to their conditional probability tables). If a word doesn’t occur with the context in training, then it isn’t included in its conditional probability table and therefore won’t be generated.