Text Classification Flashcards

Question 1

Q

What are the possible uses of text classification? (3)

Answer

A

Spam Detection

Topic Classification

Sentiment Analysis

Question 2

Q

Formalise the text classification objective.

Answer

A

Given a document (d) and a set of classes C. We want to assign the document d to the most appropriate class c.

Question 3

Q

What is the Naive Bayes assumption?

Answer

A

features are conditionally independent given class.

Question 4

Q

What are the features of a document for topic classification?

Answer

A

Usually the count of the most frequent words

Question 5

Q

What are the features of a document for sentiment analysis?

Answer

A

Usually the count of words from a sentiment lexicon. Some words are attributed to certain classes. For example “good”, “adorable”,brave” would be associated with a positive class while “bad”, “ugly”, “cowardly” would be associated with a negative class.

Question 6

Q

How is the prior estimated for naive bayes?

Answer

A

Normally done using maximum-likelihood estimation. The number of documents that belong to a class C over the total number of documents is the prior for class C - P(C)

Question 7

Q

How are the conditional probabilities of the features in text classifcation for naive bayes estimated?

Answer

A

Normally done using smoothing applied to maximum-likelihood estimation.

Question 8

Q

Alternative features for text classification

Answer

A

Use binary features (did this word x occur in the document, yes or no). Use only a subset of the vocabulary. Use more complex features (morphological features, bigrams, synctatic features).

Question 9

Q

Advantages of Naive Bayes

Answer

A

1) Fast and easy to train/test 2) Simple model so is easy to implement 3) Doesn’t require as much training date 4) Usually works well

Question 10

Q

Disadvantage of Naive Bayes

Answer

A

The naive independence assumption is very weak, words tend to be correlated. So features are not really independent.

Question 11

Q

What is intrinsic evaluation?

Answer

A

An evaluation measure inherent to the task

Question 12

Q

Given an example of an intrinsic evaluation measure for language modelling

Answer

A

Perplexity

Question 13

Q

Given an example of an intrinsic evaluation measure for POS tagging

Answer

A

accuracy (% of tags correct)

Question 14

Q

Given an example of an intrinsic evaluation measure for categorization

Question 15

Q

What is extrinsic evaluation? Give an example for language modelling

Answer

A

measure effects on a downstream task Language modelling: Does it improve my ASR/MT task?

Question 16

Q

How to deal with unbalanced classes?

Answer

Study These Flashcards

A

1) Collect more data 2) Augment some of the data you do have 3) Create copies of training samples.

Question 17

Q

What is the precision?

Answer

Study These Flashcards

A

items the system detected that were right/items the system detected

In other words (true positives)/(false positives + true positives)

Question 18

Q

What is the recall?

Answer

Study These Flashcards

A

Items the system detected that were right/items the system should have detected

(true positives)/(true positives + false negatives)

Question 19

Q

What is the F-measure (equation)

Answer

Study These Flashcards

A

F_β = ((β²+1)PR)/(β²P + R)

Question 20

Q

What is the harmonic mean of the precision and recall? (F-1)

Answer

Study These Flashcards

A

The F-measure with β set to 1: F₁ = 2PR/(P + R)

Question 21

Q

How is the F-measure decreased/increased

Answer

Study These Flashcards

A

For precision and recall values that are close to eachother, the F-measure is smaller. For those that are further apart, the F-Score is much bigger.

Text Classification Flashcards

(21 cards)