hash tables Flashcards
what is a hash table
data structure where items are stored in a location determined by their content - content based inde
what is every hash table fundamentally composed of?
array of items
hash function: that converts item to an index
what is a hash function
is a mapping from an item to an integer
what are the requirements for a good hash function?
fast compute
deterministic
spread keys evenly in hash table - no large gaps
what is a perfect has function
maps an every distinct key onto a distinct integer
minimal perfect hash function has no holes in the array
how are collisions addressed?
open addressing - linear and quadratic probing
closed addressing - chaining
explain linear probing insert
generate hash code = h
while A[h] contains a key A[h] - {key, value}
explain linear probing find
generate hash code = h while A[h] contains a key: -- if A[h]{key} == key return value -- h = (h+1)%table size return not found
issues with linear probing
- primary clustering - multiple adjacent items and slow performance
- uneven gaps in the table
- full table
what to do when the table becomes full?
throw an error
create new and larger table
automatically make a larger array and then rehash every thing
what is the load factor
the proportion of the table that is fill (λ)
probability of a cell being empty
1 - load factor (lambda)
1-λ
average number of cells to be examined for insert
T(λ) = (1+1/(1-λ)^2)/2
average number of cells to be examined for successful find
T(λ) = (1+1/(1-λ))/2
quadratic probing insert
generate hash code h = hash(key), i =1
while a[h] contains a key
h = h+i*i%table size
a[h] = key, value
quadratic probing find
generate hash code h = hash(key), i =1 while a[h] contains a key -- if A[h]{key} == key return value --- h = h+i*i%table size --- i++
return not found
basic idea of collision resolution by chaining
use a table of pointers/references
each new item must be added to a linked list at that position in the table
insertion algorithm for chaining
generate hash code h = hash(key), p = new Node
p.data = {key, value}; p.next = A[h]
A[h] = p
find algorithm for chaining
generate hash code h = hash(key), p = A[h]
while p != null and p{key} != key
— p = p.next
return p
analyze the chaining technique
n items and m table size - even distribution of hash values and use of all possible hash values
on ave each linked list is of size n/m
average search time - 1/2 n/m
complexity analysis of chaining
best - O(n/m) ave - 1/2 n/m worst - O(n) m - tablesize n - number of items
All dependent on choice of M relative to n
handling deletions open addressing vs chaining
open addressing:
- deletion not easily possible
- add a flag to mark deleted items - skip deleted items during search, unmark and over write deleted items on insert
chaining:
- use linked list deletions
other variations to chaining
double hashing - if there is a collision because of due to secondary clustering use a second hash function when there is a collision
h = H1 +H2 - where H2 is the rehashing function
a different sequence is checked for each key
chaining using BST or other data structures
what is secondary clustering?
where a key generates the same sequence of locations to check