Topic 2: N-gram modeling Flashcards

1
Q

N-gram

A

N-gram is a N-token sequence of words. example of bi-gram, tri-gram??

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

N-gram model

A

language model is a prediction model. predicting words form previous N-1 words.
statistical model
Language Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

application of n-gram model

A

give with example

  1. spelling correction
  2. speech recognition
  3. augmentative communication
  4. machine translation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Simple n-grams

A

probability of the word,w given history P( w | h )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Relative frequency counts

A

example…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Corpus based estimation

A

example…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Easier Estimation

A
this utilizes chain rule of probability. example
P(its water was so transparent) 
= P(its) * 
P(water|its)*
P(so|its water was)*
P(transparent|its water was so)|...
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Intuition of n-gram model

A

approximate the history by just the last few words instead of computing probability of the entire word history.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Markov assumption

A

N-gram model comes with independent assumption that probability of some future unit can be predicted without looking too far into the past.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Exercise bi-gram model with maximum likelihood estimation
I am Sam
Sam I am
I do not like green eggs and ham
Calculate bigram probabilities from this corpus.
P(I|) = 2/3 = 0.67 P(Sam|) = P(am|I) =
P(|Sam) = P(Sam|am) = P(do|I) =

A
P(I|)=2/3=0.67
P(Sam|)=1/3=0.33
P(am|I)=2/3=0.67 
P(|Sam)=1/2=0.5
P(Sam|am)=1/2=0.5
P(do|I)=1/3=0.33
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Relative Frequency

A

obtained by dividing frequency of a sequence by frequency of a prefix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bi-gram count exercise

A

refer to slides. have unigram count..have bigram table
calculated the probability by
value in bigram table divided by unigram word count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what knowledge can be captured by N-gram probabilities.

A
  1. world knowledge
  2. syntax
  3. discourse
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Evaluating language models. what are the 2 types of evaluation

A

extrinsic evaluation - embed language model in an application and measure how it improves, measure end to end performance, expensive
intrinsic evaluation - training and test set. measure quality of model, independent of application

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Training and testing paradigm

A
  • evaluating different architectures
  • development, training and test set
    split 80:20
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Intuition of perplexity

A

how to predict next word, a better model is the one predicts the word that’s actually occurs

17
Q

Perplexity as evaluation metric

A

best language model is the one best predicts an unseen test set.
Perplexity is the inverse probability of the test set, normalized by the number of word.
perplexity calculation….

18
Q

Generalization

A

the statistical model is highly dependent on the training corpus
it’s pretty useless if the training sets and test sets are from different genre..
business meetings vs movie?

19
Q

challenges in language model

A

dynamically adapt to different genres

20
Q

Zeros

A

one kind of generalization: zeros

test set have data occurrences that does not occur in the training set.

21
Q

Incorrect estimation

A

this is the problem of zeros
underestimated probability that all sorts of words will occur.
if the probability of any word in test set is 0. entire probability of test set is 0.
perplexity can’t be computed

22
Q

Smoothing (Laplace)

A

introduced to overcome zero problem
how it’s being done?
add 1 to all the word counts denominator is also adjusted with extra V observation

23
Q

Laplace smoothing exercise

A

probability with added count
bi gram table
vocab number