InIndexes Implementation Flashcards

1
Q

How can we use an array to represent an index dictionary?

A

Create a list of structs sorted by the term name. Compress the dictionary and postings list.
Binary search to look up the term and then follow the pointer to the list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the main 2 methods for implementing a dictionary for an index?

A
  1. Hashtables
  2. Search trees
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does a hash table work for an index?

A

Each term is hashed to an integer key which is used to determine the spot in the array where the key-value pair will be stored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the pros and cons of using hash tables for indexes?

A

Pros:
1. Lookup is faster than the search tree methof
Cons:
1. There is no easy way to find minor variants
2. Cannot perform prefix search
3. If the vocabulary grows, you have to do an expensive rehashing operation to avoid collisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some methods for dealing with hash table collisions?

A
  1. Linear probing
  2. Quadratic probing
  3. Double hashing
  4. Separate chaining
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is separate chaining?

A

A method of avoiding hash collisions where any time there is a collision, a new entry is added to a linked list at that spot. This is how the inverted index functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are some issues associated with search trees?

A
  1. Not all search trees are balanced so some nodes will require near O(n) access time.
  2. Record deletions may leave some nodes in the tree nearly empty which results in performance deterioration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is it important to have a balanced tree?

A

Guarantees nodes will be at a high level. Limits the disk accesses necessary for search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a B-Tree?

A

A “balanced tree”. It ensures that the tree is always balanced and space wasted by deletions never becomes excessive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How many disk accesses does searching require for a B Tree?

A

Successful search: costs at most h+1 disk accesses for level h tree
Unsuccessful search: Always requires h disk accesses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the pros and cons of using a B-Tree for an index?

A

Pros:
1. Solves the prefix search problem
Cons:
1. Slower O(log N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly