Chapter 2 Flashcards
in what form are terms put?
in stem form
what is BOW?
vector representation of query
what is the space model representation of collection of documents
term document matrix
an n-dimensional space with different terms used in the index of a set of documents is called __
term vector space
properties of binary weight
gives equal relevance to terms
useful when frequency is not important
it only counts distinct words in a document
why do we use term weighting
1.it allows partial matching
2.retrieves documents that approximate to query
3.improves quality of answer set
4.enables ranking of retrieved document
what is used to measure the general importance of a term
IDF
what is the limitation of TF?
if used alone, it favors common words and log documents.
when does IDF be largest?
when a term is found in only one document
which is most used term weighting?
IF*IDF
what does a high tf*idf indicate
a term occurs frequently in one doc but rarely in others
low tf*idf
term occurs in all docs
what does similarity measure measure
distance between query and document
what are the considerations of similarity measures
- length of document
2.number of terms in common
3.weather the terms are common or uncommon
4.frequency of the term
which similarity measure technique uses difference of square roots
eucledean distance