Rehashing Flashcards
Choose size of table
- Too big will waste memory; too small will increase collision and may eventually force rehashing
- Should be appropriate for the hash function used
Load factor
-A characteristic of the hash table is its load factor α
-Given a hash table T with m slots that stores n elements,
α= n/m
Alpha and collision resolution policies
§ Hashing with chaining: α can be greater than 1
o Performance degrades with length of chains
o Expected length of a linked list = load factor = a = n/m
§ Hashing with open addressing: 0 ≤ α ≤ 1
o The probability of collisions increases as the load factor increases
Load factor size
§ As the load factor grows larger the hash table becomes slower
o If the table is close to full, the search time grows and may become equal to the size of the
table
§ Low load factor is not especially beneficial. As low factor approaches 0, unused areas in the table increases
§ The expected constant time (for insert, search, and delete) of a hash table assumes that the load factor is kept below some bound
§ Good strategy α < 0.5, i.e. the size of the table is more than twice the number of the records
When do we do rehashing?
§ If the load factor exceeds a certain value (greater than 0.5) we do rehashing
§ Build a second table twice as large as the original and rehash there all the keys of the original table
Rehashing Big O analysis
§Obviously rehashing is an expensive operation, O(n)
o There are n elements to rehash and the table size is roughly 2n
Rehashing with Cuckoo Hashing
§A problem with cuckoo hashing is the probability of having a cycle that prevents the insertion from completing
§ If the table’s load factor is below 0.5, it has been shown that the probability of a cycle is very low
§ Rehashing is performed when the load factor is at 0.5 or higher
What is a good hash function?
oBe easy and quick to compute
oAchieve an even distribution of the key values that actually occur across the
index range supported by the table oMinimal number of collision
Perfect hashing
§The best hash function are those that map every unique key to a unique integer.
Some hash functions
§ Middle of square
-h(k) returns middle digits of k2
§ Division
-h(k) returns k % m (m size of the table)
§ Multiplicative
-h(k) returns the first few digits of the fractional part of k*A, where A is a fraction
A = ( 5 − 1) 2
Other hash functions (don’t look too much into it, not on exam)
§ Truncation
oIgnore part of the key and use the rest as the array index (converting non-numeric parts) oIf students have an 9-digit identification number, take the last 3 digits as the table position
(e.g. 925371622 becomes 622)
§ Folding
◦ Partition the key into several parts and then combine them in any convenient way
◦ Unlike truncation, uses information from the whole key. Split a 9-digit number into three 3-
digit numbers, and add them (e.g. 925371622 becomes 925 + 376 + 622 = 1923)
§Digit analysis
◦ If all the keys have been known in advance, then we could delete the digits of keys having the most skewed distributions, and use the rest digits as hash address
Polynomial Hash function (similar to what I did on PA)
Polynomial hash function (or Horner’s method)
o Treat the binary representation of a key as a number
o How the keys are treated as numbers?
o Represent each character with p bits, then the string can be treated as base 2p number
AKEY : 00001 01011 00101 11001
= 1323 + 11322 + 5321+ 25320 = 44271
Universal hashing
In the case where: all keys all map to the same slot, giving worst-case behavior
use universal hashing:
o Use a different random hash function each time
o Ensure that the random hash function is independent of the keys that are actually going to be stored
o Ensure that the random hash function is “good” by carefully designing a class of functions to choose from -Design a universal class of functions
Why use hash tables?
§Hash tables support fast insert and search
-O(1) average case performance
- Deletion is possible, but degrades performance
- Not suited if ordering of element is important