Hash Tables Flashcards
What is a dictionary?
A data structure that stores data by mapping keys to values
What are the methods that are implemented in dictionaries?
insert remove find size isEmpty keys values
Describe some possible implications of design decisions with dictionaries
Insert - overwrite existing keys?
Find - fail or return null for missing key?
What are the three possible implementations of associative arrays (dictionaries)?
List
Array
Binary search tree
Describe some features of list implementation of a dictionary?
Linear search complexity looking for keys
Simple
Describe some features of array implementation of a dictionary?
Same list of key/value pairs
Fixed size
O(n) search/add/remove
Describe some features of binary search tree implementation of a dictionary?
O(log n) search, addition and removal
Need an order on keys
What do hash tables provide?
Key-index mapping in a simple and uniform way
Describe a hash table
Data structure that implements a dictionary, by using a hash function to compute an index into an array of buckets from which the desired value can be found
What is a hash function?
Takes a key and produces an index/hash code
Describe the two stages in a hash function
- Hash key to its hash code l using the hash function h(k)
- Use the hash code as an index into a bucket containing the value associated with the key
What is a collision?
Two different keys will map to the same hash code
How is the has code further reduced to try and avoid collisions?
Using modulo arithmetic
Give some desirable properties of hash functions
Gives two far apart hash codes for values close together
Two keys are unlikely to have the same hash code
Maps the keys uniformly across the range of hash codes
What are two common methods of compression mapping?
Division method (modulo arithmetic) Multiple and Divide
When do compression mapping methods work best?
When m is prime
Describe the Multiple and Divide (MAD) method for compression mapping
c(l) = (al + b)mod m for some a and b
What is a cryptographic hash function?
Hash function that given a hash code l, it is very hard to generate a k such that h(k) = l
Also known as non-invertible function
What are cryptographic hash functions used for?
Tamper proof files
Pros of indexed storage classes
Very fast O(1) random access if you have an index, otherwise O(n)
Disadvantages of indexed storage classes
Generally not expandable without copying
Pros of sequential storage classes
Generally fast to expand by adding head or tail
Disadvantage of sequential storage classes
Everything built by traversal from end(s) so generally O(n)
Pros of associative storage classes
Use almost any value as key
Can address any key randomly
Disadvantages of associative storage classes
Dynamic memory management
How does storage get allocated in memory?
A pool of memory (the heap)
Structured as a block chain
Free and used blocks
Consequences of the setup of memory allocation
Dynamic storage is easy to use but expensive to manage
Allocating lots of small objects can cause heap fragmentation
Garbage collection is relatively expensive, so doing it a lot can cause performance problems
What is heap fragmentation?
Most of the memory is allocated in a large number of chunks, leaving a good percentage of your total memory un-allocated but un-usable for lots of circumstances
What is the key performance measure of a hash table?
The load factor
What is the load factor of a hash table/
Probability of a collision happening
What is open addressing?
Use hashing to find a candidate bucket
Have a strategy to re map colliding keys and to find them again afterwards
What is linear probing?
We hash the bucket and then step along the array until we find the key we are looking for
What is quadratic probing?
Rather than linearily stepping through the array, use quadratic stepping where each jump is quadratically bigger
What is secondary hashing?
Use a second hash function to pick the step, if we still get a collision, revert to linear probing
What does open addressing allow?
Fixed memory footprint and management of collisions
What is separate chaining?
Bucket is a list of key/value pairs
Hash to find the bucket then add it to the list
Benefits of separate chaining
Unlimited storage
No worries about collisions, since they are handled automatically
What is the worst case for separate chaining?
O(n)
Average case of separate chaining?
O(n/m)
What is extendible hashing?
Treats the hash code as a bit string that’s expanded piecemeal
What happens in extendible hashing when a collision occurs?
Bucket is split, consuming another bit of the hash code
Repeat until the hash codes differ
What are infinite hash codes?
Use the key as a seed to a pseudo-random number generator
Call the number generator repeatedly to get new bits as required, indefinitely