Week 4 Flashcards

1
Q
  1. What is the main difference between NLP and NLU?
    A. NLP focuses on speech only, while NLU handles images
    B. NLP processes language, NLU interprets meaning and intent
    C. NLP uses syntax, while NLU uses translation
    D. NLU is for structured data, NLP for unstructured
A

Correct Answer: B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following best describes tokenization?
A. Removing noise from text
B. Converting structured data to unstructured format
C. Splitting text into smaller units like words or phrases
D. Translating text into another language

A

✅ Correct Answer: C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the key limitation of stemming?
A. It increases model accuracy
B. It can create non-dictionary words
C. It requires part-of-speech tags
D. It is only useful for speech data

A

Correct Answer: B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which NLP task helps identify words like ‘David’, ‘Apple Store’, or ‘Nigeria’?
A. Tokenization
B. Text Summarization
C. Named Entity Recognition
D. Sentiment Analysis

A

Correct Answer: C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Lemmatization is a more refined version of stemming.

A

✅ True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Corpus refers to a single sentence in a dataset.

A

❌ False — it’s a collection of text documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

N-grams are useful for analyzing patterns in text classification.

A

✅ True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Text normalization increases the number of unique tokens

A

❌ False — it reduces them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Q1: Write a Python snippet using SpaCy to tokenize the sentence: “I love playing football in Port-Harcourt.”

A

import spacy

nlp = spacy.load(“en_core_web_sm”)
doc = nlp(“I love playing football in Port-Harcourt.”)

tokens = [token.text for token in doc]
print(tokens) # Output: [‘I’, ‘love’, ‘playing’, ‘football’, ‘in’, ‘Port’, ‘-‘, ‘Harcourt’, ‘.’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Q2: Using SpaCy, write code that performs lemmatization on the sentence: “The kids are playing outside.”

A

import spacy

nlp = spacy.load(“en_core_web_sm”)
doc = nlp(“The kids are playing outside.”)

lemmas = [token.lemma_ for token in doc]
print(lemmas) # Output: [‘the’, ‘kid’, ‘be’, ‘play’, ‘outside’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do stemming and lemmatization contribute to improving text classification models?

A

✅ Sample Answer:
Both processes reduce words to their root form, helping reduce redundancy and dimensionality in text data. This allows models to generalize better across word variants (e.g., “run”, “runs”, “ran”), improving accuracy and reducing overfitting. However, stemming may create invalid roots, while lemmatization is more accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In what real-world scenarios would tokenization and named entity recognition (NER) be critical?

A

✅ Sample Answer:
Tokenization is foundational for any NLP task—search engines, spam filters, chatbots. NER is critical in information extraction: legal document analysis, medical diagnosis systems, customer service systems (e.g., identifying names, dates, product names from complaints).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A telecommunications company wants to use NLP to categorize customer complaints automatically.

A

Q: Design a step-by-step solution using NLP techniques.

✅ Solution:

Data Collection: Gather customer messages from email/chat logs.

Tokenization: Split each message into tokens (words or phrases).

Text Normalization: Apply lemmatization to standardize words.

Keyword Extraction: Use TF-IDF or spaCy’s noun_chunks to extract important phrases.

Text Classification: Train a supervised model (e.g., Logistic Regression or Naive Bayes) with complaint category labels (Billing, Network, Customer Service).

Evaluation: Measure model accuracy, precision, recall.

Deployment: Use a pipeline that receives input, processes text, and returns a department classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly