Natural Language Processing Flashcards
What is term frequency?
How often does this term appear in the document (prominence)
What is inverse document frequency?
How infrequently does the term appear across docuemnts
What is Bayes Theorem?
P( A | B) = P(B |A) P(A)/P(B)
What is laplace smoothing?
Add one smoothing simply adds a constant to each count. This accounts for overfitting.
What is information theory?
Study of transmission, storage and retrieval of digital information.
What is entropy?
The average uncertainty of a random variable:
H(p) = H(X) = -weighted_sum(p(x)log2p(x))
What is joint entropy?
Specify 2 variables, information required for both
chgeck slides for formula
What is conditional entropy?
Civen one variable, how much info to specify to the other.
Check slides for formula
What is the chain rule?
H(X, Y) = H(x) + H(Y|X)
What is mutual information?
H(X) + H(Y|X) = H(Y) + H(X|Y)
therefore: H(X) - H(X | Y) = H(Y) - H(Y |X)
What is the noisy channel model?
First used in speech recognition, is used to reconstruct a message from an input channel.
What is a statistical language model?
What are the advantages and disadvantages of Neural networks?
Advantages:
- Unlimited input length
- Model size is independent of input size
- History dependent
- Model parameters shared across time steps
Disadvantages:
- Long delays are a problem
- Can’t see the future
What is the statistical language model?
mi = argmax p(zh|mi)p(mi)
Translation models have been estimated by aligned corpora, and this makes it hard to estimate p(zh|mi).
p(zh| mi) - is the translation model
p(mi) - is the language model
What are some facts about chatGPT?
Trained 93% english, 7% other.
By June 2020: 175 x 10^9 parameters defined f