CHAPTER 5 part 1 Flashcards

1
Q

………….is the core component of modern Natural Language Processing (NLP).

A

language model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

T/F language model , It is a probabilistic statistical model that determines the probability of a given sequence of words occurring in a sentence based on the previous words.

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

more about language model

A

⚫ It helps to predict which word is
more likely to appear next in the sentence.
⚫ It’s a tool that analyzes the pattern of human language for the prediction of words.
⚫ Language models analyze bodies of text data to provide a basis for their word predictions.
⚫ Widely used in NLP applications like chatbots and search engines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does Language Model Works?

A

⚫ Language Models determine the probability of the next word by analyzing the text in data.
⚫ These models interpret the data by feeding it through algorithms.
⚫ The algorithms are responsible for creating rules for the context in natural language
⚫ The models are prepared for the prediction of words by learning the features and characteristics of a language.
⚫ With this learning, the model prepares itself for understanding phrases and predicting the next words in sentences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of Language Models
There are primarily two types of language models:

A
  1. Statistical Language Models
  2. Neural Language Models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

…………….predict the next word based on previous words.

Use probabilistic techniques to analyze text patterns.

A

Statistical models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Popular Statistical Models

A

1- N-Gram Model – Uses a fixed-length sequence of words.
2- Bidirectional Model – Considers both past and future words.
3- Exponential Model – Assigns probabilities based on exponential functions.
4-Continuous Space Model – Represents words in a continuous vector space.
5- Neural language Model – Uses neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

N-Gram Model

A

This is one of the simplest approaches to language modelling

The value of ‘n’ defines the size of the sequence (e.g., n=4 for a
4-gram, like “can you help me”).

– ‘n’ represents the amount of context the model considers when predicting the next word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F An N-Gram model creates a probability distribution for a sequence of ‘n’ tokens (words)

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

There are different types of N-Gram models such as, ………… , ……….

A

bigrams, trigrams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Let’s understand N-gram with an example

A

⚫ Example Sentence:
“I like learning and practice NLP in this lecture”

– Unigrams:
“I”, “like”, “learning”, “and”, “practice”, “NLP”, “in”, “this”, “lecture”

– Bigram Example:
(“I”, “like”), (“like”, “learning”), (“learning”, “and”), (“and”, “practice”),
(“practice”, “NLP”), (“NLP”, “in”), (“in”, “this”), (“this”, “lecture”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Calculating N-Gram

A
  • The N-Gram model assigns probabilities to sequences of words based on their occurrence in a training corpus.
  • For example, given the sentences:
  • “There was heavy rain”. VS “There was heavy flood”

⚫ An N-gram model will tell us that “heavy rain” occurs much more often
than “heavy flood” in the training corpus

⚫ Hence, the N-Gram model assigns a higher probability to “There was
heavy rain”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

…………….is a collection of text data consisting of the proceedings of the European Parliament from 1996 to 2012.

A

Europarl Corpus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T/F For a bigram model, The prediction of the next word depends only on the previous word.

For an n-gram model, only the preceding (n-1) words are considered

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T/F The Markov assumption simplifies language modeling by stating that only the most recent words in a sentence matter when predicting the next word

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Problem with n-gram

A

⚫ One problem with with N-Gram models is data sparsity.

⚫ This occurs when the model encounters word sequences
(N-Grams) that were not seen during training. As a result,\
the model assigns them a zero probability.

⚫ Techniques to solve this problem include smoothing, backoff
and interpolation

17
Q

……….is a two-word sequence of two words coming together to form a meaning

⚫ Example:
“I like”, “like learning”, “learning and”, “ and practice”, “practice
NLP”, “ NLP in”, “in this”, “this lecture”.

18
Q

………………..is a three-word sequence of three words coming together to form a meaning

⚫ Example:
For the previous sentence, the trigram would simply be:
“I like learning”, “learning and practicing”, “practicing NLP in “ in
this lecture”

19
Q

⚫ Example:
For the previous sentence, the trigram would simply be:
“I like learning”, “learning and practicing”, “practicing NLP in “ in
this lecture”

20
Q

T/F Unlike n-gram models, which analyze text in one direction (backwards), bidirectional models analyze text in both directions, (backwards and forwards)

These models can predict any word in a sentence or body of text by using every other word in the text

21
Q

Bidirectional (cont.)

T/F Examining text bidirectionally increases result accuracy

This type is often utilized in machine learning and speech generation applications

Example: Google uses a bidirectional model to process search queries

22
Q

This type of statistical model evaluates text by using an equation which is a combination of n-grams and feature functions

⚫ Here the features and parameters of the desired results
are already specified

⚫ This model has fewer statistical assumptions which mean
the chances of having accurate results are more.

A

Exponential

23
Q

⚫ In this type of statistical model, words are arranged as a
non-linear combination of weights in a neural network

⚫ The process of assigning weight to a word is known as word embedding

⚫ This type of model proves helpful in scenarios where the data set of words continues to become large and include
unique words.

A

Continuous Space

24
Q

JUST READ
⚫ These language models are based on neural networks and
are often considered as an advanced approach to execute
NLP tasks

⚫ Neural language models overcome the shortcomings of
classical models such as n-gram and are used for complex
tasks such as speech recognition or machine translation