Week 2 Flashcards

1
Q

What is a Lexicon in NLP?

A

A lexicon is a collection of words, often with definitions, parts of speech, and other linguistic information used to analyze language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Stopwords?

A

Stopwords are common words (e.g., “the”, “and”, “is”) that are often removed in NLP tasks because they add little value to the meaning of a text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define WordNet in NLP

A

WordNet is a semantic dictionary of English, grouping words into synsets (synonym sets) with definitions and usage examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Synset in WordNet?

A

A synset is a set of synonyms that represent one concept in WordNet, providing a richer structure than traditional dictionaries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the difference between Stemming and Lemmatization.

A

Stemming removes affixes to find the word stem (e.g., “running” to “run”), while lemmatization reduces a word to its dictionary form (lemma), considering context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Tokenization?

A

Tokenization is the process of breaking text into individual tokens (words or punctuation), which simplifies text analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is Named Entity Recognition (NER) important in NLP?

A

NER identifies and classifies entities in text, like names, locations, and dates, helping to extract specific information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are N-Grams in language modeling?

A

N-Grams are sequences of N words used to capture word patterns in text, such as bigrams (2 words) and trigrams (3 words).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Bag of Words (BoW) model?

A

BoW is a representation that treats text as a collection of words, ignoring grammar and word order, focusing only on word frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe TF-IDF and its purpose.

A

TF-IDF (Term Frequency-Inverse Document Frequency) highlights important words in a document by balancing term frequency with rarity across all documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Text Classification?

A

Text classification is the task of categorizing text into predefined labels, like spam detection or sentiment analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why are Decision Trees used in NLP?

A

Decision trees are used for text classification tasks due to their simplicity and interpretability, breaking down decisions based on word features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Naïve Bayes algorithm’s role in NLP?

A

Naïve Bayes is a probabilistic algorithm used for text classification, assuming feature independence to simplify calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain Zero Counts and Smoothing in Naïve Bayes.

A

Smoothing assigns a small probability to unseen words in the training data, preventing zero probabilities in calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Model Evaluation in text classification?

A

Model evaluation measures a model’s accuracy and reliability on tasks, often using metrics like precision, recall, and F1-score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define Accuracy in classification.

A

Accuracy is the percentage of correctly classified instances in a test set, showing overall model correctness.

17
Q

What is Precision in classification?

A

Precision is the fraction of true positive predictions out of all positive predictions, indicating model reliability in predicting a specific class.

18
Q

What is Recall in classification?

A

Recall is the fraction of true positives retrieved from all actual positives, measuring model sensitivity to finding relevant items.

19
Q

Why is the F1-Score used?

A

The F1-score combines precision and recall into a single metric, useful when dealing with imbalanced datasets.

20
Q

What is a Confusion Matrix in model evaluation?

A

A confusion matrix is a table showing true vs. predicted classifications, helping to visualize model performance across classes.