Hashing (ALL) Flashcards
What is hashing?
Hashing is a dictionary data structure for the Table ADT
what are some desirable properties of a hash function?
- It should be computable quickly (constant time).
- if keys are drawn uniformly at random, then the hashed values should be uniformly distributed.
- keys that are “close” should have their hash values “spread out”.
What are the collision resolution policies?
- Open addressing
- Chaining
- Robin Hood hashing
What is open addressing?
Open addressing uses no extra space - every element is stored in the hash table. If it gets overfull, we can reallocate space and rehash.
When a collision occurs in open addressing, we can
- probe nearby for a free position (linear probing (easy to implement but leads to clustering), quadratic probing);
- go to a “random” position by using a second-level hash function (double hashing);
- try a position given by a second, third, . . . hash function (this plus LCFS gives cuckoo hashing).
What is chaining?
Chaining uses an “overflow” list for each element in the hash table
- Elements that hash to the same slot are placed in a list. The slot contains a pointer to the head of this list.
- Insertion can then be done in constant time.
- A drawback is the additional space overhead.
- The distribution of sizes of lists turns out to be very uneven.
Insertion has the same cost as an unsuccessful search, provided the table is not full
true or false?
true
The average cost of a successful search is the average of all insertions needed to construct the table from empty
true or false?
true
What is the simple uniform hashing model?
That is, each of the n keys is equally likely to hash into any of the m slots. So we are considering a “balls in bins” model
If n is much smaller than m, collisions will be few and most slots will be empty. If n is much larger than m, collisions will be many and no slots will be empty. The most interesting behaviour is when m and n are of comparable size
What is load factor, λ?
n/m
n = keys
m = slots
Balls in Bins
When do we expect the first collision?
E(m) ≈ sqrt(πm/2) + 2/3.
So collisions happen even in fairly sparse tables
Balls in Bins
When do we expect all boxes to be nonempty?
after about m log m balls. It takes a long time to use all lists when chaining
Balls in Bins
What proportion of boxes are expected to be empty when n is Θ(m)?
e−λ .
Many of the lists are just wasted space even for pretty full tables
Balls in Bins
When m is Θ(n), what is the expected maximum number of balls in a box?
about (log n)/(log log n). Some of the lists may be fairly long
What is the worst case for searching in chaining?
Θ(n), since there may be only one chain with all the keys
What is e average cost for unsuccessful search in chaining?
the average list length, namely λ.