Information Retrieval Flashcards

1
Q

What is vector space model ?

A

represent each document as a vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Dimensionality reduction ?

A

removing stop-words and very rare words, and selecting only the most distinctive terms, which is a proces known as feature selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Latent Semantic Analysis ?

A

A method for term-by-document dimensionality reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Precision

A

percentage of true positives out of all returned documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Recall

A

percentage of true positives out of all relevant documents in the collection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

F-measure

A

weighted harmonic mean between Precision and Recall

f = 2PR/(P+R)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the Types of Text Classification ? give examples

A

topic categorization
sentiment classification
authorship attribution and plagiarism detection
spam detection and e-mail classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Evaluation Measures for Text Classification ?

A

Macro-averaged precision, recall, and F-measure

micro-averaged precision, recall, and F-measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Similarity-based Text Classification ?

A

we use cosine similarity and Euclidean distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Common N-gram(CNG) method ?

A
  • creating n-gram author profile

- use Euclidean distance similarity measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly