Mod 8 Flashcards
When is it optimal to use a map data type?
When insertion, lookup, and removal are the **only ** operations we need, we can use the map data type.
What is a hash table?
A hash table is similar to an array, where the key is transformed into an index through a hash function. A hash function is a function that takes values of some type (e.g. string, struct, double, etc.) and maps them to an integer index value. We can then use this value to both store and retrieve data from an actual array.
What are the two steps to compute an index in a hash function?
- hash = hash_function(key)
This is finding the hash.
- index = hash % array_size
This is ensuring the hash fits within the array size. It is always better to use a prime number for array size as remainders of prime numbers have more uniformity and hence there will be less collisions.
What are the three important properties of a hash function?
- Determinism – a given input should always map to the same hash value.
- Uniformity– the inputs should be mapped as evenly as possible over the output range. A non-uniform function can result in many collisions, where multiple elements are hashed to the same array index. We’ll look more at this later.
- Speed – the function should have a low computational burden.
What is DJB2?
- A widely-used hash function:
def hash_djb2(s):
hash = 5381
for x in s:
# hash * 33 + c (bitshift left 5 places = * 32)
hash = ( (hash «_space;5) + hash ) + ord(x)
return hash & 0xFFFFFFFF
How is a minimally perfect hash function different from a perfect hash function?
Both perfect and minimally perfect hash functions map all keys to a distinct integer.
Minimally perfect hash functions take this one step further. They map N keys to exactly inters 0 to N-1, with each key getting precisely one value. Hence the output of a minimally perfect function is different in that the distinct integers are consecutive.
How can we handle hash table collisions without probing?
We can eliminate collisions entirely if we allow multiple keys to share the same table entry (i.e., array index). To accommodate multiple keys, linked lists can be used to store the individual keys that map to the same entry. The linked lists are commonly referred to as buckets or chains, and this technique of collision resolution is known as chaining.
What is the load factor of a hash table?
The load factor of a hash table is the average number of elements in each bucket:
𝝺=n/m
- n is the total number of elements stored in the table
- m is the number of buckets
Can the load factor be greater than 1 in a hash table?
Only in a chained hash table, where multiple items can go in one bucket.
Ina chained table, what is the average number of links traversed for searches?
For a linked list-based chained table, the average number of links traversed for successful searches is 𝝺 / 2. For unsuccessful searches, the average number of links traversed is equal to 𝝺.
What is the average-case complexity of a linked list-based chained hash table?
What about worst-case?
Assuming good distribution:
The average case for all operations is O(𝝺).
If the number of buckets is adjusted according to the load factor, then the number of elements is a constant factor of the number of buckets i.e.:
𝝺 = n/m = O(m)/m = O(1).
The worst-case complexity is O(n), since all of the elements might end up in the same bucket.
What is probing?
Probing occurs when the index from a hash function is already filled in a hash table with open addressing. Hence a new index must be calculated by probing until an empty index is found.
Is linear probing good?
- Linear probing: i = iinitial + j (where j = 1, 2, 3, …)
Linear probing can be problematic due to clustering. As a cluster becomes bigger, collisions become more likely and it takes longer and longer to find new empty spaces. Other probing methods may save time by looking further ahead and wrapping to the beginning once they reach the end.
What is quadratic probing? Does it have any drawbacks?
Quadratic probing: i = ( iinitial + j2) % m (where, j = 1, 2, 3, …)
Quadratic probing can be problematic if there are empty spaces in an array that cannot be returned by the quadratic probing function.
What is double hashing?
A form of probing that increases by the hash itself. It helps to reduce clustering (which bloats insertion times):
i = ( iinitial + j * h2(key) ) % m (where, j = 1, 2, 3, …)