Language Modelling Flashcards
What is a language model?
It is a model that assigns probabilities to sequences of worsd
What is the most basic of language models?
The n-gram model, which assigns probabilities to sentences and sequences of words
What can n-gram models be used to?
Estimate the probability of the last word of an n-gram given the previous words, and assign probabilities to entire sequences
Where can language models be used?
Speech Recognition
Spelling correction or grammatical error correction
Machine translation
Augmentative and alternative communication systems
What does a unigram assume?
That the appearance of words are independent of each other
P(w1, w2, w3, w4) = P(w1) * P(w2) * P(w3) * P(w4)
It assumes that the previous words have no influence on the next word
What do n-gram models inform us?
The probability of the next word in the text is dependent on the previous n-1 words in the text.
If we have a word w, and some history h, we want to find out of the times that h occurred, how many times was it followed by w.
In general, how do n-gram probabilities work?
P(w1) = P(w1)
P(w1,w2) = P(w2 | w1) * P(w1)
P(w1, …, wn) = P(wn | w1, …, wn-1) * P(w1, …, wn-1)
How does the bigram model approximate the probability of the next word?
We can use the probability of the next word given the previous word:
P(wn | wn-1)
What assumption is made with the n-gram model?
The Markov Assumption - the probability of a word depends only on the previous word, this can be generalised to trigrams and so on
What does MLE stand for?
Maximum Likelihood Estimation
What does MLE do?
It is the process of choosing the right set of bigram parameters to make our model correctly predict (maximise the likelihood of) the nth word in the the text
How do we obtain the MLE for the parameters of an n-gram model?
We observe the n-gram counts from a representative corpus Normalise them (dividing by a total count) to lie between 0 and 1
What format is used when computing language model probabilities?
We use the log format so we can add the probabilities instead of multiplying them
What are some n-gram problems?
Even if we have a large corpus, only a tiny minority of possible n-grams exist in any corpus. The probability of a word appearing given the previous word is 0 for most sequences due to the sparsity of the matrix.
What are some n-gram problems?
Even if we have a large corpus, only a tiny minority of possible n-grams exist in any corpus. The probability of a word appearing given the previous word is 0 for most sequences due to the sparsity of the matrix.