13 Text Classification and Naive Bayes Flashcards

1
Q

Standing query

A

Is like any other query except that is is periodically executed on a collection to which new docs are incremantally added over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classification

A

Given a set of classes, we seeks to determine which class(es) a given object belongs to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Routing/filtering

A

Classification using standing queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Topics

A

A general class is usually referred to as topic. For instance ÒChinaÓ or ÒcoffeÓ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Applications of classification in IR

A

Preprocessing steps Finding a docs encoding, truecasing, and identifying the language of a doc Spam Automatic detection of spam pges, which are then not included in the search engine index Porn mode Filter out Sentiment detection The automatic classification of a movie or prduct review as positive or neg- ative. Personal email sorting Finding the correct folder for a new email. Topic-specific or vertical search Vertical search engines restrict searches to a partical topic. For example, the query Òcompute scienceÓ on a fertical search engine for the topic China will return a list of Chinese computer science departments with higher precision and recall than the query Òcomputer science ChinaÓ on a general prupose search engine. Ranking function The ranking function in ad hoc IR can also be based on a document classifier. More specified later. (sec. 15.4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Rules in text classification (TC)

A

A rule captures a certain combination of keywords that indicates a class. Hand-coded rules have good scaling properties, but creating and maintaining them over time is labor intensive. In machine learning these rules are learned automatically from training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Statistical text classification

A

The approach where where rules are learning automatically with machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Labeling

A

The process of annotating each doc with its class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Document space

A

In TC we are given a description d ∈ X of a document where X is the docu- ment space. Typically, the document space is some type of high-dimensional space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Space class

A

In TC, we are given a fixed set of classes C = {c1, c2,…,cj} Typically, the classes are human defined for the needs of an application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Training set

A

In TC, we are usually given a training set D of labeled docs <d> where <d> ∈ X x C</d></d>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Learning method classifier

A

Using a learning method or learning algorithm, we then with to learn a classificer or classification function ɣ that maps documents to classes: ɣ : X -> C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Surpervised learning

A

The above type over learning is called supervised learning because a su- pervisor serves as a teacher directing the learning process. We denote the supervised learning method _ and write _(D) = _. The learning method _ takes the training set D as input and return the learning classification function _

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Test set data

A

Oncle we have learning ɣ, we can apply it to the test set, for example, the new doc, “first rpivate Chinese airline” whose class in unknown. The classification function hopefylle assigns the new document to class ɣ(d’) = China.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Spareseness

A

The training data are never large enough to represent the frequency of rare events adequately. Therefore, the probability will often be zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bernoulli model

A

Equivalent to the binary independence model, which generates an indicator for each term of the vocabulary, either 1 indicating presence of the term in the doc or 0 indicating absence.

17
Q

Concept drift

A

The graudal change over time of the concept underlying a lcass like US pres- ident from George W. Bush to Barack Obama. The Bernoulli model is par- ticularly robust with respect to this because the most important indicators of a class are less likely to change.

18
Q

Feature selection

A

Is the process of selecting a subset of the terms occuring the training set and using only this subset as features in TC. Two main purposes:

  1. It makes training and applying a classifier more efficient by decreasing the size of the vocabulary. Important for classifiers that are expensive to train (unlike NB)
  2. It often increases classification accuracy by elimination noise features
19
Q

Noise feature

A

Is one that, when added to the doc representation, increases the classification error on new data. Example Suppose are rare term, say arachnocentric, has no info about a class, say China, but all instances of arachnocentric happen to occur in China docs in out training set. Then the learning method might produce a classifier that misassigns test docs containing arachnocentric to China.

20
Q

Overfitting

A

Such an incorrect generalization from the example above from an accidental property of the training set is called overfitting.

21
Q

Mutual information (MI)

A

MI measures how much info the presence/absence of a term contributes to making the correct classification decision.

22
Q

X2 feature selection

A

In statistics, the X2 test is apllied to test the independce of two events. In feature selection, the two events are occurerence of the term and occurrence of the class. A high X2 value indicates that the hypothesis of independence, which implies that expected and observed counts are similar, are inncorrect.

23
Q

Frequency-based feature selection

A

A feature selection method. It selects some frequent terms that have no specific info about the class.

24
Q

Greedy feature selection

A

All the tree feature selection methods desrbed (MI, X2 and frequency based feature selection) are examples of greedy methods. They may select features that contribute no incremental information over previously selected features.

25
Q

Two-class classifier

A

An approach to an any-of problem. You must learn several two-class classi- fier, one for each class, where the two-class classifier for class c is the classifier for the two classes c and its complement c̄

26
Q

Effectiveness

A

Is a generic term for measures that ealuate the quality of classification deci- sions, including precision, recall, F1 and accuaracy.

27
Q

Performance

A

Refers to the computational efficiency of classification and IR systems.

28
Q

Macro/Micro- averaging

A

We often want to compute a single aggregate measures that combines the measures for individual classifier. There are two methods for doing this.

Macroaveraging
Computes a simple average over classes

Microaveraging
Pools per-doc devision across classes, and then computes an effectiveness measure on the pooled contigency table.

The differences between the two methods can be large. Macroaveraging gives equacl weights to each class, whereas microaveraging gives equal weight to each per-doc classification decision.

29
Q

Development set

A

A set for testing while you develop your methods.