ch 2 Flashcards
What is the problem associated with modeling natural languages as opposed to formal languages?
The problem is that formal languages, like programming languages, can be precisely defined.
while natural languages are inherently ambiguous and subject to constant change
what is Language modeling (LM) ?
involves applying statistical and probabilistic techniques to assess the likelihood of a specific sequence of words occurring in a sentence.
What is Statistical Language Modeling (LM)?
Statistical Language Modeling involves developing probabilistic models capable of predicting the next word in a sequence based on preceding words.
______ and _____ are the 2 Types of language models
- Statistical Language Modeling (LM)
- Neural Language Models (NLM)
What are the tasks associated with Language Modeling?
- assigning probabilities to sentences in a language
- evaluating the probability for each sequence of words
- estimating the likelihood of a given word (or sequence of words) following a specific word sequence
What are examples of Statistical Language Models?
- N-gram Model,
- Bidirectional Model,
- Exponential Model, and
- Continuous Space Model.
What is a Neural Language Model (NLM)?
refer to the utilization of neural networks in language modeling.
________ Analyzes text backward and creates a probability distribution of sequences “n.”
N-gram Model
_______ Utilizes an equation combining n-grams and other parameters for text evaluation, offering higher accuracy than the n-gram model.
Exponential Model:
_______ is Based on weighting each word (word embedding), particularly useful for large texts or datasets.
Continuous Space Model:
_____ address the data sparsity issue of n-grams.
NLMs
Why is Probability necessary?
Essential in tasks with noisy, ambiguous inputs like speech recognition.
■ P(back soonish)»_space; P(bassoon dish).
Crucial for writing tools (spelling and grammar correction) to detect and correct errors.
■ P(There are)»_space; P(Their are)
■ P(has improved)»_space; P(has improve)
_____ is the simplest language model that assigns probabilities to sequences of words.
The n-gram
○(bigram) : “please turn”, “turn your”, ”your homework”,
○ (a trigram) : “please turn your”, “turn your homework”.
How do Language model compute probabilities?
- P(w|h), the probability of the next word w given some history h.
- P(the | its water is so transparent that)
How to calculate this?
- P(A/B) = P(A U B) / P(B)
- joint probability:
“Out of all possible sequences of 5 words,
how many of them are ‘its water is so
transparent?” - The Chain Rule
- P(x1,x2,x3,…,xn) = P(x1)P(x2|x1)P(x3|x1,x2)…P(xn|x1,…,xn-1)
- P(“its water is so transparent”) =
P(its) × P(water|its) × P(is|its water) × P(so|its water is) x P(transparent|its water is so)
Explain the Markov models
- assumes we can predict the probability of some future unit without looking too far into the past.
- P(“its water is so transparent that”) = P(the / that)
- P(“its water is so transparent that”) = P(the / transparent that)
explain Markov model for unigram and bigram
● Unigram Model: P ( Wi )
● BigramModel:P(Wi|Wi-1)
extend Markov model to trigrams, 4-grams, 5-grams
- Trigram Model:
P(wi | wi-1, wi-2) - Probability of word wi given the two previous words wi-1 and wi-2. - 4-gram Model:
P(wi | wi-1, wi-2, wi-3) - Probability of word wi given the three previous words.
Trigrams, 4-grams, or 5-grams are preferred over bi-grams for more context and improved accuracy.
t