Hashing (ALL) Flashcards
What is hashing?
Hashing is a dictionary data structure for the Table ADT
what are some desirable properties of a hash function?
- It should be computable quickly (constant time).
- if keys are drawn uniformly at random, then the hashed values should be uniformly distributed.
- keys that are “close” should have their hash values “spread out”.
What are the collision resolution policies?
- Open addressing
- Chaining
- Robin Hood hashing
What is open addressing?
Open addressing uses no extra space - every element is stored in the hash table. If it gets overfull, we can reallocate space and rehash.
When a collision occurs in open addressing, we can
- probe nearby for a free position (linear probing (easy to implement but leads to clustering), quadratic probing);
- go to a “random” position by using a second-level hash function (double hashing);
- try a position given by a second, third, . . . hash function (this plus LCFS gives cuckoo hashing).
What is chaining?
Chaining uses an “overflow” list for each element in the hash table
- Elements that hash to the same slot are placed in a list. The slot contains a pointer to the head of this list.
- Insertion can then be done in constant time.
- A drawback is the additional space overhead.
- The distribution of sizes of lists turns out to be very uneven.
Insertion has the same cost as an unsuccessful search, provided the table is not full
true or false?
true
The average cost of a successful search is the average of all insertions needed to construct the table from empty
true or false?
true
What is the simple uniform hashing model?
That is, each of the n keys is equally likely to hash into any of the m slots. So we are considering a “balls in bins” model
If n is much smaller than m, collisions will be few and most slots will be empty. If n is much larger than m, collisions will be many and no slots will be empty. The most interesting behaviour is when m and n are of comparable size
What is load factor, λ?
n/m
n = keys
m = slots
Balls in Bins
When do we expect the first collision?
E(m) ≈ sqrt(πm/2) + 2/3.
So collisions happen even in fairly sparse tables
Balls in Bins
When do we expect all boxes to be nonempty?
after about m log m balls. It takes a long time to use all lists when chaining
Balls in Bins
What proportion of boxes are expected to be empty when n is Θ(m)?
e−λ .
Many of the lists are just wasted space even for pretty full tables
Balls in Bins
When m is Θ(n), what is the expected maximum number of balls in a box?
about (log n)/(log log n). Some of the lists may be fairly long
What is the worst case for searching in chaining?
Θ(n), since there may be only one chain with all the keys
What is e average cost for unsuccessful search in chaining?
the average list length, namely λ.
What is the average cost for successful search in chaining?
The average cost for successful search is then
(n + 1)/2m ≈ λ/2
Provided load factor is kept bounded what is the runtime on average for basic operations?
constant time
O(1)
Linear probing average insertion cost?
Linear probing is not well described by the uniform hashing model (because of the clustering). A more detailed analysis can be done.
The average insertion cost is Θ(1 + 1/(1 − λ) 2 ).
Uniform hashing average insertion cost?
Θ(1/(1 − λ)) as m → ∞.
Uniform hashing unsuccessful search?
Θ(1/(1 − λ))
Deletion in hash tables
If we use chaining, deletion just involves deleting an element from the relevant linked list. However, we must still find it in order to delete it
open addressing, deletion can be trickier. One method is to mark elements as deleted, but not delete them. However this can cause space wastage
If we simply delete an item, then we may no longer be able to find other items that had previously collided with it
Effect of hashing on runtime of binary search and search trees?
O(1) dictionary operations instead of the O(lg n) needed for binary search (static tables) and/or balanced search trees (dynamic tables)
What happens the larger the load factor gets?
larger λ is the more chance of collisions.
OALP Example

OADH Example

What are the main desired properties for a good hashing scheme?
- The hash locations of the keys S are spread out to minimize collisions (for lookup, insert, deletes,…).
- Use a relatively small table size m = O(n) that doesn’t affect performance.
- The function h is fast to compute (polynomial in the size of a key x ∈ S, but constant time with respect to n).
NOTE: If we use separate chaining for collisions, the time needed for processing key x is O(|A[h(x)]|) where |A[i]| denotes the length of the chain/list at index i