Indexing Texxt Flashcards
1
Q
Why do we index large collections?
A
Fast similarity computation
2
Q
What data structures are used?
A
Inverted Index, Forward index
3
Q
How do we index and process queries?
A
Inversion and QP algs
4
Q
What are the two data structures for Lexicon?
A
Hash-based and B+ tree-based
5
Q
What is Inverted Index?
A
For each term t, we must store the list of all documents that contain t
6
Q
What is Forward Index
A
Mapping of doc-ids to term-ids
7
Q
What is memory-based inversion?
A
Change from doc: [term, positions] to term: [document, <pos>] using the dictionary</pos>
8
Q
What is Sort-based Inversion ?
A