Hash Table Flashcards
What is the name of the function a Hash Table uses to create its indexes?
A Hash Function.
Define a Hash Function.
A Hash Function will take all of the keys for a given map and strategically map them to certain index locations in an array so that they can eventually be retrieved easily.
What is the BigO of finding the index of a key?
O(1). Near instantaneous.
What is the backing Data Structure of a HashTable?
An array. The keys map to indexes in an array.
What happens when our hash function comes up with the same index value for two separate keys? For example both the key “Steven” and the key “Steve” both compute to the index 9.
A “Hash Collision” occurs.
Why do we use a hash function to come up with a numerical index?
To allow the use of an Array to store our data.
To minimise the size of the array. Our hash function squashes our keys down to a minimised index range so that we can have arrays sized relatively small compared to the range of our key values. E.g. key is 9 Billion, we could hash it to within a range of 1,000.
What are the two possible ways to handle a “Hash Collision”?
- Open Addressing
2. Closed Addressing
What is Open Addressing?
Put the key in some other index location separate from the one returned to us by the hash function. Usually the next OPEN (or free/available/nil) index.
What are the downsides of Open Addressing?
- Makes it harder to interact with the data in the map later on.
- Leads to issues when another key hashes to the (previously nil) used up index.
What is Closed Addressing?
Uses Linked Lists to chain together keys which hash to the same index value.
What are the downsides of Closed Addressing?
To get to the key we want we have to traverse the LinkedList stored in the ‘Collision Index’. This slows down the access time from O(1) to O(n). n being the size of LinkedList, i.e. the number of collisions for that particular index.
Why is it difficult to do BigO calculations on HashTables?
Because we always assume the worst-case scenario. For a Hash Table this means every single key hashes to the same index value, which means (using Closed Addressing) we essentially end up with a single Linked List which contains all our elements. In this scenario all BigO calculations are that of the Linked List… O(n).
Why do we want a reliable Hash Function?
To minimise Hash Collisions.
Knowing that the worst-case scenario for Hash Tables is terrible, what are the BigO results for the average-case scenario for a Hash Table?
They are all O(1)!
To ACCESS, SEARCH for, INSERT, or DELETE a key/value pair from out dictionary, all we need to do. is run that key though the has function and it’ll tell us what index in the hash table to go to in order to perform that operation.