121 Week 11 - Hashing Flashcards
What does a hashing function do
Converts a key (often a string) to an integer.
It is a deterministic function meaning it always gives the same output if the input remains the same.
Hash table
A data structure used for storing data.
Elements are stored in buckets.
Each table element has a unique part called the key.
The key is inputted into a hash function which returns an output. This output corresponds to which bucket the element will be stored in.
Why are hash tables used.
Hash tables allow for very fast retrieval of data as, assuming collisions are low, retrieval has big O of O(1).
Hash example: sum first 4 letters
The position in the alphabet of the first 4 letters of a word is added together to get the hash value of the word.
Limitations: collisions can be common, array is limited to 104 (4 * 26) elements, indicies above 75 and below 10 are uncommon.
Could be solved by adding all letters or including length.
Better hash example: multiply first 4 letters
Alpabetic position of first 4 letters are L1-4.
Index = L1 + L2×26 + L3×262 + L4×263.
This fixes having a limited array length and reduces collisions.
Limitations: Need a very long array to be able to store elements in the direct index of the hash function.
How to deal with large indices.
If you want to use an array with a size of Size you can use: newIndex = index%Size to get a new index which will be within the boundary 0-Size. (% = int remainder division).
Use a size of around double the number of expected inputs.
Bit shifting vs multiplication
Bit shifting is a lot faster to do than multiplication so by multiplying by powers of 2, we can use bit shifting instead and create more efficient hash functions.
How can collisions be reduced in hash tables.
Collisions are reduced if there is a wide range of index values and a uniform distribution of keys as this reduces the chance of 2 keys hashing to the same value.
How can collisions be dealt with in hash tables.
If a collision occurs we can use chaining to have a chain of elements starting at the first element in that bucket.
When retrieving an element, if the element in the bucket is not the correct element, complete a search on the chain to find the element or determine if it is in the hash table.
How can key-value pairs be used to store more data.
Key value pairs allows for more data than just the key to be stored in a hash table. The key will be used to find the hash value for the buckets location but other information can be stored in the bucket as well.