Information Retrieval Flashcards

1
Q

Precision

A

The fraction of retrieved results relevant to the information needed. P(Relevant | Retrieved)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Recall

A

The fraction of relevant documents in the collection retrieved by the system P(Retrieved | Relevant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Term Frequency

A

For each document:

Number term i is in the document / Total number of terms in the document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inverse Document Frequency

A

idf(i) = log(2) ( |D| / |{d E D: t(i) in d}|)

Number of documents / number of documents the term appears in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Term Weighting

A

Term Frequency x Inverse Document Frequencey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

TF-IDF

A

Basis for assigning weights to terms in documents. Based upon how common a term is within a document and the frequency of a term in a document collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sequence Pointer

A

Leaves in a B+Tree are linked to each other in a linked list. Range Queries or ordered iteration through the blocks simple and efficient. Advantage over B-Trees, no significant space increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logical Query Plan

A

Abstract algebraic representation of query, operators are taken from relational algebra

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Physical Query Plan

A

Algorithms selected for each operator in the plan. Execution order is specified for each operator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Rocchio method

A

Query refinement through relevance feedback. Retrieve original queries, present results, ask user to indicate relevant/non-relevant, ‘push’ towards relevant vectors and away from non-relevant vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Stop Word Removal

A

Remove extremely common words from a ‘stop list’ e.g. a, o, the.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Stemming

A

Remove syntactic variations of a word, e.g. suffix-stripping or a lookup table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Group Nouns

A

Nouns carry the most meaning. Use groups of adjacent nouns to index as terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly