Lecture 3: Text Classification & Naive Bayes Flashcards

Question 1

Q

Text Classification takes two things as an input. What are those?

Answer

A

A document d

And fixed set of classes C = {c_1,c_2,c_3,…,c_j}

Question 2

Q

Text Classification produces one output. What is that?

Answer

A

A predicted class c in C

Question 3

Q

Name some classification methods

Answer

A

Hand-Coded Rules

2. Supervised machine learning

Question 4

Q

In this lecture, we primarily work with one supervised machine learning algorithm. What is that?

Answer

A

Naive Bayes

Question 5

Q

What is Naive Bayes based on? And into what representation does it transform a document into?

Answer

A

Simple (“naïve”) classification method based on Bayes rule

Relies on very simple representation of document -> Bag of Words

Question 6

Q

Explain the concept of a “Bag of Words”

Answer

A

See it as a dictionary of words where each key is the word, and each value is the number of occurrences of that word in the given text document.

Example:

(1) John likes to watch movies. Mary likes movies too.
(2) Mary also likes to watch football games.

BoW1 {“John”:1,”likes”:2,”to”:1,”watch”:1,”movies”:2,”Mary”:1,”too”:1}

BoW2{“Mary”:1,”also”:1,”likes”:1,”to”:1,”watch”:1,”football”:1,”games”:1}

Question 7

Q

What is the formula of Naive Bayes algorithm?

Answer

A

P(c|d) = P(d|c)P(c)/P(d)

You then find the class that gives the largest probability given a document

Question 8

Q

What are some assumptions of Multinomial Naive Bayes?

Answer

A

Bag of words assumption: Assumes position does not matter
Conditional Independence: Assumes the feature probabilities P(xi|cj) are independent given class c

Question 9

Q

What is Laplace Smoothing?

Answer

A

Laplace or “Add-1 smoothing”, is a smoothing technique that helps tackle the problem of zero probability in the Naïve Bayes machine learning algorithm.

Question 10

Q

What are the benefits of the Naive Bayes Classifier?

Answer

A

Very fast, low storage requirements
Robust to irrelevant features
- Irrelevant features cancel each other without affecting results
Very good in domains with many equally important features
- Decision trees suffer from fragmentation in this area, especially if little data
Optimal if the independence assumptions hold
A good dependable baseline for text classification
- But we will see other classifiers that give better accuracy

Question 11

Q

How can you measure the performance of text classifiers?

Answer

A

Recall
Precision
Accuracy
F-Score

Question 12

Q

If we have more than one class, how can we combine multiple performance measures into one quantity?

Answer

A

Macroaveraging: Compute performance for each class, then average

Microaveraging: Collect decisions for all classes, compute contingency table, evaluate

Question 13

Q

What is cross-validation?

Answer

A

By doing cross-validation you avoid overfitting due to handling errors from different splits.

Question 14

Q

If you have very little data, you should use… (Which classifier?)

Answer

A

Naive Bayes

Question 15

Q

If you have a reasonable amount of data you should use… (Which classifier?)

Answer

A

It's perfect for all the clever classifiers:
- SVM
- Logistic Regression
- Decision Trees
etc.

Question 16

Q

If you have a huge amount of data you should use… (Which classifier?)

Answer

Study These Flashcards

A

Logistic Regression can work
Naive Bayes can work (cause its fast)

At a cost:
SVMs(Train time) or kNN(test time) can be too slow

Question 17

Q

Classifier may not matter if…

Answer

Study These Flashcards

A

You have enough data (Basically all almost all of the classifiers ranked close in accuracy when the amount of data increased)

Question 18

Q

What is meant by “Underflow”?

Answer

Study These Flashcards

A

Underflow, is a phenomenon that can occur when multiplying a lot of probabilities.

Essentially, what it means is that by multiplying a lot of probabilities, you may get a number so incredibly small, that the computer cannot actually represent in memory on its central processing unit (CPU).

Question 19

Q

How can you deal with Underflow?

Answer

Study These Flashcards

A

sum log of probabilities rather than multiplying probabilities.

Lecture 3: Text Classification & Naive Bayes Flashcards

(19 cards)