Lecture 3: Text Classification & Naive Bayes Flashcards
Text Classification takes two things as an input. What are those?
A document d
And fixed set of classes C = {c_1,c_2,c_3,…,c_j}
Text Classification produces one output. What is that?
A predicted class c in C
Name some classification methods
- Hand-Coded Rules
2. Supervised machine learning
In this lecture, we primarily work with one supervised machine learning algorithm. What is that?
Naive Bayes
What is Naive Bayes based on? And into what representation does it transform a document into?
Simple (“naïve”) classification method based on Bayes rule
Relies on very simple representation of document -> Bag of Words
Explain the concept of a “Bag of Words”
See it as a dictionary of words where each key is the word, and each value is the number of occurrences of that word in the given text document.
Example:
(1) John likes to watch movies. Mary likes movies too.
(2) Mary also likes to watch football games.
BoW1 {“John”:1,”likes”:2,”to”:1,”watch”:1,”movies”:2,”Mary”:1,”too”:1}
BoW2{“Mary”:1,”also”:1,”likes”:1,”to”:1,”watch”:1,”football”:1,”games”:1}
What is the formula of Naive Bayes algorithm?
P(c|d) = P(d|c)P(c)/P(d)
You then find the class that gives the largest probability given a document
What are some assumptions of Multinomial Naive Bayes?
Bag of words assumption: Assumes position does not matter Conditional Independence: Assumes the feature probabilities P(xi|cj) are independent given class c
What is Laplace Smoothing?
Laplace or “Add-1 smoothing”, is a smoothing technique that helps tackle the problem of zero probability in the Naïve Bayes machine learning algorithm.
What are the benefits of the Naive Bayes Classifier?
- Very fast, low storage requirements
- Robust to irrelevant features
- Irrelevant features cancel each other without affecting results
- Very good in domains with many equally important features
- Decision trees suffer from fragmentation in this area, especially if little data
- Optimal if the independence assumptions hold
- A good dependable baseline for text classification
- But we will see other classifiers that give better accuracy
How can you measure the performance of text classifiers?
- Recall
- Precision
- Accuracy
- F-Score
If we have more than one class, how can we combine multiple performance measures into one quantity?
Macroaveraging: Compute performance for each class, then average
Microaveraging: Collect decisions for all classes, compute contingency table, evaluate
What is cross-validation?
By doing cross-validation you avoid overfitting due to handling errors from different splits.
If you have very little data, you should use… (Which classifier?)
Naive Bayes
If you have a reasonable amount of data you should use… (Which classifier?)
It's perfect for all the clever classifiers: - SVM - Logistic Regression - Decision Trees etc.