Topic 2: N-gram modeling Flashcards
N-gram
N-gram is a N-token sequence of words. example of bi-gram, tri-gram??
N-gram model
language model is a prediction model. predicting words form previous N-1 words.
statistical model
Language Model
application of n-gram model
give with example
- spelling correction
- speech recognition
- augmentative communication
- machine translation
Simple n-grams
probability of the word,w given history P( w | h )
Relative frequency counts
example…
Corpus based estimation
example…
Easier Estimation
this utilizes chain rule of probability. example P(its water was so transparent) = P(its) * P(water|its)* P(so|its water was)* P(transparent|its water was so)|...
Intuition of n-gram model
approximate the history by just the last few words instead of computing probability of the entire word history.
Markov assumption
N-gram model comes with independent assumption that probability of some future unit can be predicted without looking too far into the past.
Exercise bi-gram model with maximum likelihood estimation
I am Sam
Sam I am
I do not like green eggs and ham
Calculate bigram probabilities from this corpus.
P(I|) = 2/3 = 0.67 P(Sam|) = P(am|I) =
P(|Sam) = P(Sam|am) = P(do|I) =
P(I|)=2/3=0.67 P(Sam|)=1/3=0.33 P(am|I)=2/3=0.67 P(|Sam)=1/2=0.5 P(Sam|am)=1/2=0.5 P(do|I)=1/3=0.33
Relative Frequency
obtained by dividing frequency of a sequence by frequency of a prefix
Bi-gram count exercise
refer to slides. have unigram count..have bigram table
calculated the probability by
value in bigram table divided by unigram word count
what knowledge can be captured by N-gram probabilities.
- world knowledge
- syntax
- discourse
Evaluating language models. what are the 2 types of evaluation
extrinsic evaluation - embed language model in an application and measure how it improves, measure end to end performance, expensive
intrinsic evaluation - training and test set. measure quality of model, independent of application
Training and testing paradigm
- evaluating different architectures
- development, training and test set
split 80:20