Natural Language Processing Flashcards

Question 1

Q

What is Natural Language Processing (NLP)?

Answer

A

NLP is the area of AI focused on enabling computers to understand, interpret, and generate human language.

Question 2

Q

What are the two main areas of NLP?

Answer

A

Natural Language Understanding (NLU) and Natural Language Generation (NLG).

Question 3

Q

What is the difference between NLU and NLG?

Answer

A

NLU focuses on interpreting and understanding human input, while NLG involves generating human-like language from data.

Question 4

Q

Why is natural language difficult for machines to process?

Answer

A

Due to ambiguity, variability, context-dependence, and the complexity of human language.

Question 5

Q

What is ambiguity in language?

Answer

A

When a sentence or phrase has multiple possible interpretations.

Question 6

Q

What is an example of syntactic ambiguity?

Answer

A

“I saw the man with the telescope” – it’s unclear who has the telescope.

Question 7

Q

What are the basic steps of an NLP pipeline?

Answer

A

Tokenization, POS tagging, parsing, named entity recognition, semantic analysis, etc.

Question 8

Q

What is tokenization?

Answer

A

Splitting text into individual units such as words or sentences.

Question 9

Q

What is POS tagging?

Answer

A

Part-of-speech tagging assigns word categories like noun, verb, etc., to each token.

Question 10

Q

What is a language model?

Answer

A

A model that assigns probabilities to sequences of words.

Question 11

Q

What is the purpose of a language model in NLP?

Answer

A

To predict the next word in a sentence or evaluate the likelihood of a sentence.

Question 12

Q

What are common types of language models?

Answer

A

N-gram models, neural language models, transformer-based models.

Question 13

Q

What is an N-gram in NLP?

Answer

A

A sequence of N words used for modeling language.

Question 14

Q

Give examples of bigrams and trigrams.

Answer

A

Bigram: “I am”, Trigram: “I am happy”

Question 15

Q

How is the probability of a sentence estimated in an N-gram model?

Answer

A

By multiplying the probabilities of the individual N-grams.

Question 16

Q

What are limitations of N-gram models?

Answer

A

They have limited context and suffer from data sparsity.

Question 17

Q

What is smoothing in N-gram models?

Answer

A

A technique to handle unseen N-grams by adjusting probabilities.

Question 18

Q

Name a common smoothing technique.

Answer

A

Add-one (Laplace) smoothing.

Question 19

Q

How do neural language models improve over N-gram models?

Answer

A

They learn word embeddings and can model longer dependencies.

Question 20

Q

What are word embeddings?

Answer

A

Vector representations of words capturing semantic similarity.

Question 21

Q

What is Word2Vec?

Answer

A

A model that learns word embeddings by predicting word contexts (or vice versa).

Question 22

Q

What are the two main architectures of Word2Vec?

Answer

A

CBOW (Continuous Bag of Words) and Skip-Gram.

Question 23

Q

What does CBOW do?

Answer

A

Predicts a word from its surrounding context.

Question 24

Q

What does Skip-Gram do?

Answer

A

Predicts context words from a target word.

Question 25

Q

What metric is used to evaluate language models?

Answer

A

Perplexity.

Question 26

Q

What does a lower perplexity indicate?

Answer

A

Better language model performance (more confident predictions).

Question 27

Q

Name some applications of NLP.

Answer

A

Machine translation, sentiment analysis, chatbots, information retrieval, etc.

Question 28

Q