Hash Tables Flashcards
What is the name of the function a Hash Table uses to create its indexes?
A Hash Function.
Define a Hash Function.
A Hash Function will take all of the keys for a given map and strategically map them to certain index locations in an array so that they can eventually be retrieved easily.
What is the BigO of finding the index of a key?
O(1). Near instantaneous.
O(n) if there are hash collisions
What is the backing Data Structure of a HashTable?
An array. The keys map to indexes in an array.
What happens when our hash function comes up with the same index value for two separate keys? For example both the key “Steven” and the key “Steve” both compute to the index 9.
A “Hash Collision” occurs.
Why do we use a hash function to come up with a numerical index?
To allow the use of an Array to store our data.
To minimise the size of the array. Our hash function squashes our keys down to a minimised index range so that we can have arrays sized relatively small compared to the range of our key values. E.g. key is 9 Billion, we could hash it to within a range of 1,000.
What are the possible ways to handle a “Hash Collision”?
‘Collision Resolution Policy’
- Open Addressing
- Closed Addressing “Chaining”
- Robin Hood Hashing
What is Open Addressing?
Put the key in some other index location separate from the one returned to us by the hash function. Usually the next OPEN (or free/available/nil) index.
Open addressing uses no extra space - every element is stored in the hash table. If it gets overfull, we can reallocate space and rehash.
When a collision occurs in open addressing, we can…
• probe nearby for a free position (leads to clustering, quadratic probing).
• go to a “random” position by using a second-level hash function.
• try a position given by a second, third, . . . hash function (this plus LCFS gives cuckoo hashing).
What are the downsides of Open Addressing?
- Makes it harder to interact with the data in the map later on.
- Leads to issues when another key hashes to the (previously nil) used up index.
What is Closed Addressing? (Chaining)
Uses Linked Lists to chain together keys which hash to the same index value.
Chaining uses an “overflow” list for each element in the hash table
• Elements that hash to the same slot are placed in a list. The slot contains a pointer to the head of this list.
• Insertion can then be done in constant time.
• A drawback is the additional space overhead.
• The distribution of sizes of lists turns out to be very uneven.
What are the downsides of Closed Addressing?
To get to the key we want we have to traverse the LinkedList stored in the ‘Collision Index’. This slows down the access time from O(1) to O(n). n being the size of LinkedList, i.e. the number of collisions for that particular index.
Why is it difficult to do BigO calculations on HashTables?
Because we always assume the worst-case scenario. For a Hash Table this means every single key hashes to the same index value, which means (using Closed Addressing) we essentially end up with a single Linked List which contains all our elements. In this scenario all BigO calculations are that of the Linked List… O(n).
Why do we want a reliable Hash Function?
To minimise Hash Collisions.
Knowing that the worst-case scenario for Hash Tables is terrible, what are the BigO results for the average-case scenario for a Hash Table?
They are all O(1)!
To ACCESS, SEARCH for, INSERT, or DELETE a key/value pair from out dictionary, all we need to do. is run that key though the has function and it’ll tell us what index in the hash table to go to in order to perform that operation.
*they can be O(n) in the instance of hash collisions
What is a hash table?
a data structure which organizes data using hash functions in order to support quick insertion and search.