Hashing and Hash Tables Flashcards
What are hash tables?
Hash tables are effective data structure for implementing dynamic sets that support only the dictionary operations INSERT, SEARCH and DELETE. Each element is stored in slot h(k). Where h: U -> {0, 1, …, m - 1}. It’s is useful when the universe U of all keys is too large to be stored in memory.
What are Direct-address tables?
Direct-address table is an array in which each position, or slot, corresponds to a key in the key universe U. Suitable for implementing dynamic set in which each element has a key drawn from the Universe U = {0, 1, …, m-1}, where m is not too large, assuming that no two elements have the same key.
What is a hash collision?
Hash collision is a situation when two keys have the same hash value.
What are common collision resolution techniques in hash tables?
- Collision resolution by chaining.
- Open addressing.
How does collision resolution by chaining work?
Collision resolution by chaining places all the elements with same hash into the same linked list. INSERT has the worst-running time O(1), SEARCH O(n), DELETE O(n) fro single linked lists, O(1) for double linked lists. (NOT SURE ABOUT O(1) for double linked lists).
What are three schemes for creating hash functions?
- Hashing by division
- Hashing by multiplication
- Universal hashing
What makes a good hash function?
A good hash function satisfies (approximately) the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots.
What is the division method for creating hash functions?
Division method for creating hash functions maps a key k into one of m slots by taking remainder of k divided by m. That is, the hash function is: h(k) = k mod m.
m should not be a power of 2, since if m = 2^p, then h(k) is just the p lowest-order bits of k. It is better to design a hash function to depend on all the bits of the key.
What is the multiplication method for creating hash functions?
Multiplication method operates in two steps.
- Multiply the key k by a constant A in the range 0 < A < 1 and extract the fractional part of kA.
- Multiply the value by m and take the floor of the result.
In short, the hash function is:
h(k) = floor (m (kAmod1) )
What is universal hashing?
Universal hashing is an approach which helps to decrease the probability that someone is able to choose n keys which hash to the same slot, yielding average retrieval time of O(n). Any fixed hash function is vulnerable to such worst-case behavior.
In universal hashing, at the beginning of execution we select the hash function from a carefully designed class of functions. The algorithm can behave differently on each execution, even for the same input.
When is a collection of hash functions universal?
Let H be a finite collection of hash functions that map a given universe U of keys into the range {0, 1, …, m - 1}. Such a collection is said to be universal if for each pair of distinct keys k, l from U, the number of hash functions h from H for which h(k) = h(l) is at most | H | / m.
What is open addressing?
In open addressing, all elements occupy the hash table itself. Each entry contains an element of the dynamic set or NIL. When searching we systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. In open addressing hash table can “fill up” so that no further insertions can be made.
What is probe sequence in open addressing?
Hash functions are extended with the second argument of probe number when doing open addressing.
h: U x { 0, 1, …, m - 1 } -> { 0, 1, …, m - 1 }
Probe sequence is the order at which hash table slots are examined. It has to eventually consider all of the slots from 0 to m - 1.
What are commonly used techniques to compute the probe sequences required for open addressing?
- Linear probing - h(k,i) = (h’(k) + i) mod m
- Quadratic probing - h(k,i) = (h’(k) + c1 i + c2 i^2) mod m
- Double hashing - h(k,i) = (h1(k) + ih2(k)) mod m
What is perfect hashing?
Perfect hashing is a hashing technique where O(1) memory accesses are required to perform a search in the worst case.