Week 2 Flashcards

Question 1

Q

What is a Lexicon in NLP?

Answer

A

A lexicon is a collection of words, often with definitions, parts of speech, and other linguistic information used to analyze language.

Question 2

Q

What are Stopwords?

Answer

A

Stopwords are common words (e.g., “the”, “and”, “is”) that are often removed in NLP tasks because they add little value to the meaning of a text.

Question 3

Q

Define WordNet in NLP

Answer

A

WordNet is a semantic dictionary of English, grouping words into synsets (synonym sets) with definitions and usage examples

Question 4

Q

What is a Synset in WordNet?

Answer

A

A synset is a set of synonyms that represent one concept in WordNet, providing a richer structure than traditional dictionaries.

Question 5

Q

Describe the difference between Stemming and Lemmatization.

Answer

A

Stemming removes affixes to find the word stem (e.g., “running” to “run”), while lemmatization reduces a word to its dictionary form (lemma), considering context.

Question 6

Q

What is Tokenization?

Answer

A

Tokenization is the process of breaking text into individual tokens (words or punctuation), which simplifies text analysis.

Question 7

Q

Why is Named Entity Recognition (NER) important in NLP?

Answer

A

NER identifies and classifies entities in text, like names, locations, and dates, helping to extract specific information.

Question 8

Q

What are N-Grams in language modeling?

Answer

A

N-Grams are sequences of N words used to capture word patterns in text, such as bigrams (2 words) and trigrams (3 words).

Question 9

Q

What is the Bag of Words (BoW) model?

Answer

A

BoW is a representation that treats text as a collection of words, ignoring grammar and word order, focusing only on word frequency.

Question 10

Q

Describe TF-IDF and its purpose.

Answer

A

TF-IDF (Term Frequency-Inverse Document Frequency) highlights important words in a document by balancing term frequency with rarity across all documents.

Question 11

Q

What is Text Classification?

Answer

A

Text classification is the task of categorizing text into predefined labels, like spam detection or sentiment analysis.

Question 12

Q

Why are Decision Trees used in NLP?

Answer

A

Decision trees are used for text classification tasks due to their simplicity and interpretability, breaking down decisions based on word features.

Question 13

Q

What is the Naïve Bayes algorithm’s role in NLP?

Answer

A

Naïve Bayes is a probabilistic algorithm used for text classification, assuming feature independence to simplify calculations.

Question 14

Q

Explain Zero Counts and Smoothing in Naïve Bayes.

Answer

A

Smoothing assigns a small probability to unseen words in the training data, preventing zero probabilities in calculations.

Question 15

Q

What is Model Evaluation in text classification?

Answer

A

Model evaluation measures a model’s accuracy and reliability on tasks, often using metrics like precision, recall, and F1-score.

Question 16

Q

Define Accuracy in classification.

Answer

Study These Flashcards

A

Accuracy is the percentage of correctly classified instances in a test set, showing overall model correctness.

Question 17

Q

What is Precision in classification?

Answer

Study These Flashcards

A

Precision is the fraction of true positive predictions out of all positive predictions, indicating model reliability in predicting a specific class.

Question 18

Q

What is Recall in classification?

Answer

Study These Flashcards

A

Recall is the fraction of true positives retrieved from all actual positives, measuring model sensitivity to finding relevant items.

Question 19

Q

Why is the F1-Score used?

Answer

Study These Flashcards

A

The F1-score combines precision and recall into a single metric, useful when dealing with imbalanced datasets.

Question 20

Q

What is a Confusion Matrix in model evaluation?

Answer

Study These Flashcards

A

A confusion matrix is a table showing true vs. predicted classifications, helping to visualize model performance across classes.

Week 2 Flashcards

(20 cards)