Chapter 2 Flashcards

1
Q

in what form are terms put?

A

in stem form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is BOW?

A

vector representation of query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the space model representation of collection of documents

A

term document matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

an n-dimensional space with different terms used in the index of a set of documents is called __

A

term vector space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

properties of binary weight

A

gives equal relevance to terms
useful when frequency is not important
it only counts distinct words in a document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

why do we use term weighting

A

1.it allows partial matching
2.retrieves documents that approximate to query
3.improves quality of answer set
4.enables ranking of retrieved document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is used to measure the general importance of a term

A

IDF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the limitation of TF?

A

if used alone, it favors common words and log documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when does IDF be largest?

A

when a term is found in only one document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

which is most used term weighting?

A

IF*IDF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does a high tf*idf indicate

A

a term occurs frequently in one doc but rarely in others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

low tf*idf

A

term occurs in all docs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does similarity measure measure

A

distance between query and document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the considerations of similarity measures

A
  1. length of document
    2.number of terms in common
    3.weather the terms are common or uncommon
    4.frequency of the term
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

which similarity measure technique uses difference of square roots

A

eucledean distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are term operations ? list them

A
  1. stemming
  2. weighting
  3. thesaurus
  4. truncation
  5. stoplist
17
Q

index file selection process

A

1 tokenization
2 stopword removal
3 stemming and normalization
4 term weighting
5 selecting index