Hashing Data Structures Flashcards
What are buckets?
- A file is divided into a number of buckets (usually a prime number)
- Records are stored in buckets
What is the purpose of a hashing table?
Allows fast random access to <key,value> pair logical-records…
…Without any indexing overheads
What determines the bucket location a record is stored in?
A hashing function
How do hashing functions work?
- The function will use the key field assigned to the record
- Will return a number between 0 and the total number of buckets
How do collisions occur?
When two records with different key fields are assigned to the same bucket
What are 3 methods used to minimise collisions?
- Increase the number of buckets
- Have expandable buckets that can store a flexible amount of records
- Use overflow buckets
What are the downsides of method 1:
Increase the number of buckets
- A new hashing function is needed
- All current data will need to be rehashed
What are the downsides of method 2:
Have expandable buckets that can store a flexible amount of records
- Requires an in-bucket sequential search
- No longer an O(1) search
What are the downsides of method 3:
Use overflow buckets
- Requires pointers from buckets to overflow area
- Increases search time
What is the probability of a collision occurring after adding 4 records to a file with 10 buckets?
Probability a collision has not occurred:
(10/10) x (9/10) x (8/10) x (7/10) = 0.504
Probability a collision has occurred:
1 - 0.504 = 0.496
What are 2 examples of hashing tables?
- HashSets <u></u>
- HashMaps <k, v>
What is a hash table?
- A data structure that stores data via hashing
- The index of each element is determined by a hash code
- Hash code is calculated using a hashing function
How to search a hash table for an object?
- Compute the hash code and then reduce it to find the index/bucket.
(Hash code MOD table size) - Search the elements of the bucket
- If a match is not found, the object is not present
What is the best case scenario when searching and storing data in a hash table?
- Data will be distributed evenly among buckets, so that each bucket will only store 1 item
- Time complexity O(1)
What is the worst case scenario when searching and storing data in a hash table?
- All items have been stored in a single bucket
- Time complexity O(n) as it’s a sequential search through every item of data