Information Retrieval Flashcards
What is vector space model ?
represent each document as a vector
What is Dimensionality reduction ?
removing stop-words and very rare words, and selecting only the most distinctive terms, which is a proces known as feature selection
What is Latent Semantic Analysis ?
A method for term-by-document dimensionality reduction
Precision
percentage of true positives out of all returned documents
Recall
percentage of true positives out of all relevant documents in the collection
F-measure
weighted harmonic mean between Precision and Recall
f = 2PR/(P+R)
what are the Types of Text Classification ? give examples
topic categorization
sentiment classification
authorship attribution and plagiarism detection
spam detection and e-mail classification
Evaluation Measures for Text Classification ?
Macro-averaged precision, recall, and F-measure
micro-averaged precision, recall, and F-measure
Similarity-based Text Classification ?
we use cosine similarity and Euclidean distance
What is the Common N-gram(CNG) method ?
- creating n-gram author profile
- use Euclidean distance similarity measure