Hash Tables Flashcards
Hash tables
- store key value pairs
- usually are a fixed size array of some prime number size (that is larger than the num of elements)
Hash function
takes in a thing and returns an unsigned integer value which is then mod’ed by the size of the table and places there!
Three required properties of fast hash functions
- must be deterministic (same value every time for a key)
- must be fast
- must be evenly distributed
Integer overflow is fine , as long as the function is deterministic meaning??
Overflow will almost always occur, just make sure the function returns the same num every time!
Two ways to resolve collisions?
Separate chaining
Open Addressing
Three types of open addressing?
Linear probing
Quadratic Probing
Double Hashing
Separate Chaining
All keys map to a linked list, which can add additional values that are mapped
Load factor
ratio of number of elements divided by table size
In separate chaining, the load factor is….
the average number of elements in a bucket
Average time of a unsuccessful find?
Load factor!
Average time on a successful find?
1 + (loadfactor/2)
Smaller the load factor, the more memory it uses, but the few collisons!
True
Bigger load factor uses…
less memory but is slower
Why is it necessary to keep each key in the chain as well as the value?
Because otherwise you can’t be sure which collision value is the value you’re looking for!
Linear probing checks spots in what order starting with hash(k)
hash(k)+1 hash(k)+2 hash(k)+3 hash(k)+4 ......
Probs with linear probing?
large blocks of occupied cells you have to check
-holes when an element is removed…
quadratic probing order
hash(k)
hash(k)+4
+9
+16
Double hashing
2 hash functions!
hash(k)
hash(k) +1hash2(k)
hash(k)+2hash2(k)
Wjhy does the table size have to be prime with double hashing?
So that it will prevent thrashing, and also will lead to a better distribution
Rehashing
- when table is too full (load factor is too high), runninng time increases, so you should rehash
- when insert fails DEFINITELY rehash
Rehashing means
creating bigger array and rehashing each value already in the array
How to remove an element in open addressing?
- rehash upon every delete (not a good idea)
- put in a placeholder value that you know is fake!… however the table would get filled with placeholders on the faster side…
Pigeonhole principle
There would be at least one hash value with multiple keys… making an object unable to be backtraced to its key