Information Retrieval Flashcards by Mary Paterson

What are the components of information retrieval?

Documents
Index
Query
Matching

How well did you know this?

Not at all

Perfectly

What is the formula for Zipf’s law?

F(r)=C/r^α

log(F(r)) = log(C) - αlog(r)

How well did you know this?

Not at all

Perfectly

What are the two parts of text pre-processing?

Stop word removal

Stemming

How well did you know this?

Not at all

Perfectly

What is stop word removal?

The removal of common ‘noise words’ from text (e.g. ‘the’, ‘and’)

How well did you know this?

Not at all

Perfectly

What is stemming?

Removing irrelevant differences from different ‘versions’ of the same word
This reduces the number of unique words in a corpus but increases the number of instances of each word

How well did you know this?

Not at all

Perfectly

What is the formula for the inverse document frequency?

IDF(t)=log(ND/ND_t )

How well did you know this?

Not at all

Perfectly

What is the formula for the term frequency - inverse document frequency weight?

w_td=f_td.IDF(t)

How well did you know this?

Not at all

Perfectly

What is the formula for the similarity between a document and a query?

sim(q,d)=[sum of all terms in q and d(w_td.w_tq)]/(||q||.||d||)

How well did you know this?

Not at all

Perfectly

What’s the formula for document length?

||d||= √(∑w_td^2 )

How well did you know this?

Not at all

Perfectly

What is the formula for recall?

recall=|retrieved ∩relevant|/|relevant|

How well did you know this?

Not at all

Perfectly

What is the formula for precision?

precision=|retrieved ∩relevant|/|retrieved|

How well did you know this?

Not at all

Perfectly

What is query expansion?

Adding terms to a query in order to increase the overlap between the query and relevant documents

How well did you know this?

Not at all

Perfectly

What is term reweighting?

Increasing the weight of query terms that appear in relevant documents and decreasing the weights of terms that don’t appear in relevant documents

How well did you know this?

Not at all

Perfectly

What is a hyponym?

Subset of a word

How well did you know this?

Not at all

Perfectly

What is a hypernym?

Superset of a word

How well did you know this?

Not at all

Perfectly

What does the vector representation of a document contain?

Study These Flashcards

The TF-IDF weight for each term in the corpus

What does latent semantic analysis do?

Study These Flashcards

Discovers relationships between words automatically from the data

What is the formula for the word-document matrix?

Study These Flashcards

A=USV^T
U and V are orthogonal
S is a diagonal matrix

S is the strength of the most significant correlation
V is the direction of the most significant correlation

What is the formula for usefulness?

Study These Flashcards

U(t) = P(t|T)log(P(t|T)/P(t))

What is the formula for salience?

Study These Flashcards

S(t) = P(T|t)log(P(T|t)/P(T))

S(t) = P(T)*U(t)/P(t)

What are the steps of Latent Dirichlet Allocation?

Study These Flashcards

Make an initial estimate of N topics
Decompose each document into its component topics
Use the decomposition to re-estimate the topic word probabilities

What is the recursive page rank formula?

Study These Flashcards

pr_(n+1) (d)= ∑pr_n (e).w_ed

What is the Markov chain formula for page rank?

Study These Flashcards

pr_(n+1) = W^T.pr_n

What is the damping factor?

Study These Flashcards

The probability that a user with exit at any given page, denoted by delta

What is the formula for page rank including damping?

pr(d)=(1-δ)/N + δ∑pr_n (e).w_ed | N = number of documents

Information Retrieval Flashcards

(25 cards)