Hash Tables Collision Resolution Flashcards
What are bad examples of solving collision resolution?
- Making the array much bigger than we expect to store
- collision likelyhood is still really high (50% in a table of size 500)
- Open Addressing
- after a colission probe a sequence of slots in the hash table (AKA closed hashing)
- suffers from clustering and collisions before more likely as more insertions happen
- after a colission probe a sequence of slots in the hash table (AKA closed hashing)
- Linear Probing
- when a collision occurs search the slots immediately following the array index hashed to until an empty slot is found
- Suffers from cookie monster affect - later searches become increasingly expensive
What is the difference between closed and open hashing
closed(aka open addressing) - table is a fixed amount of storage and every item is plces in a hash table slot in that limited storage
open - items are stored outside the table
What is separate chaining?
Store all items that hash to a particular index in that slot.
Each table slot could be a linkedlist of items that has to that slot
How do you implement the code for separate chaining?
How does an insert and search work with separate chaining and how does it take
- When a collision occurs it adds it to the front of the linked list at the location in the table, which makes it O(1).
- Searches have to search a linked list if there are key collisions
- You could keep the list ordered but inserts become less efficient
How does traversal work with a hash table both for ordered and unordered
It is one of the standard operations, print/traverse
- un-Ordered:
- using open addressing we traverse all items moving down the array, skipping over empty or deleted-flagged slots
- separate chaning we simply traverse each linked list
- Ordered(items in hash tables are not usually ordered:
- sort the hash table (usually costs O(nlogn)
- we could use O(n) extra space by copying into a sarapate array if we don’t want to lose the hash order in the hash table
- sort the hash table (usually costs O(nlogn)
What is the formula for probability of a collision?
What is load factor?
What needs to be taken into consideration when measuring effectiveness of hashing techniques
- assumes random data and a good hash function(that spreads and inserts around evently)
- real life data is often not random
- adjust your hash function to account for biases in the data
What is the comparison for linear probing, double hashing and separate chaining for load factor?