Week 4 Flashcards

Question 1

Q

What is the main difference between NLP and NLU?
A. NLP focuses on speech only, while NLU handles images
B. NLP processes language, NLU interprets meaning and intent
C. NLP uses syntax, while NLU uses translation
D. NLU is for structured data, NLP for unstructured

Answer

A

Correct Answer: B

Question 2

Q

Which of the following best describes tokenization?
A. Removing noise from text
B. Converting structured data to unstructured format
C. Splitting text into smaller units like words or phrases
D. Translating text into another language

Answer

A

✅ Correct Answer: C

Question 3

Q

What is the key limitation of stemming?
A. It increases model accuracy
B. It can create non-dictionary words
C. It requires part-of-speech tags
D. It is only useful for speech data

Answer

A

Correct Answer: B

Question 4

Q

Which NLP task helps identify words like ‘David’, ‘Apple Store’, or ‘Nigeria’?
A. Tokenization
B. Text Summarization
C. Named Entity Recognition
D. Sentiment Analysis

Answer

A

Correct Answer: C

Question 5

Q

Lemmatization is a more refined version of stemming.

Question 6

Q

Corpus refers to a single sentence in a dataset.

Answer

A

❌ False — it’s a collection of text documents.

Question 7

Q

N-grams are useful for analyzing patterns in text classification.

Question 8

Q

Text normalization increases the number of unique tokens

Answer

A

❌ False — it reduces them.

Question 9

Q

Q1: Write a Python snippet using SpaCy to tokenize the sentence: “I love playing football in Port-Harcourt.”

Answer

A

import spacy

nlp = spacy.load(“en_core_web_sm”)
doc = nlp(“I love playing football in Port-Harcourt.”)

tokens = [token.text for token in doc]
print(tokens) # Output: [‘I’, ‘love’, ‘playing’, ‘football’, ‘in’, ‘Port’, ‘-‘, ‘Harcourt’, ‘.’]

Question 10

Q

Q2: Using SpaCy, write code that performs lemmatization on the sentence: “The kids are playing outside.”

Answer

A

import spacy

nlp = spacy.load(“en_core_web_sm”)
doc = nlp(“The kids are playing outside.”)

lemmas = [token.lemma_ for token in doc]
print(lemmas) # Output: [‘the’, ‘kid’, ‘be’, ‘play’, ‘outside’]

Question 11

Q

How do stemming and lemmatization contribute to improving text classification models?

Answer

A

✅ Sample Answer:
Both processes reduce words to their root form, helping reduce redundancy and dimensionality in text data. This allows models to generalize better across word variants (e.g., “run”, “runs”, “ran”), improving accuracy and reducing overfitting. However, stemming may create invalid roots, while lemmatization is more accurate.

Question 12

Q

In what real-world scenarios would tokenization and named entity recognition (NER) be critical?

Answer

A

✅ Sample Answer:
Tokenization is foundational for any NLP task—search engines, spam filters, chatbots. NER is critical in information extraction: legal document analysis, medical diagnosis systems, customer service systems (e.g., identifying names, dates, product names from complaints).

Question 13

Q

A telecommunications company wants to use NLP to categorize customer complaints automatically.

Answer

A

Q: Design a step-by-step solution using NLP techniques.

✅ Solution:

Data Collection: Gather customer messages from email/chat logs.

Tokenization: Split each message into tokens (words or phrases).

Text Normalization: Apply lemmatization to standardize words.

Keyword Extraction: Use TF-IDF or spaCy’s noun_chunks to extract important phrases.

Text Classification: Train a supervised model (e.g., Logistic Regression or Naive Bayes) with complaint category labels (Billing, Network, Customer Service).

Evaluation: Measure model accuracy, precision, recall.

Deployment: Use a pipeline that receives input, processes text, and returns a department classification.

Question 14

Q

Week 4 Flashcards

(14 cards)