Hashing Flashcards
Hashing applications
- databases ( hash bashed index)
- compilers & interpreters ( ex. to store variables and addresses)
- Network router ( ip table), server ( ports)
- dictionary
- substring search
- string commonality
- cryptographic functions
Hash structure
- direct access table
- store n keys in array use key as index
- Hashing with chaining
- store n keys in m size table ( m < n)
- if 2 keys have hash value, store items in a list.
- Hashing with open addressing
Hash operations
add(key, value)
delete(key, value)
search(key, value)
time complexity - O(1) ( best case)
worst case - O(n) if hash function is uniformly distributed.
Good hash function should distribute keys randomly.
Simple hash function (most used hash) - key mod m
- will place item in one of m-1 slot.
The more you add randomness better the distribution.
Ex: (( a * bk) mod p ) mod m
a , b - random numbers
p - some large prime number
m - slot size
Rabin Karp Algorithm- Rolling Hash
For substring patterns, instead of checking char by char, we can get hash of substring and match that hash against the main string by taking hash of characters of same length of substring.
for i to len(string):
sh = hash(string(i.. len(substring))
if hash(substring) == sh :
compare char by char.
else
move to i to i+1 and end -> end +1 -> repeat the loop.
Spurious hits:
sometimes, h(substring) == h(match) even though characters do not match.
Then match has to happen for all spurious hits. In worst case , it can make O(m * n)
To avoid spurious hits, hash function has to be uniformly distributed.
One such example of hash function is to add,
pos = len(substring)
s[0] * ( pow(10, pos)) + s[1] * (pow(10, pos -1)…. + s[n] * (pow(10 * 0)
we can follow rolling hash where if no match found, we can subtract the first character value s[0] * ( pow(10, pos)) ) and multiply * 10 + add s[n+1] * (pow(10 *0)
- this way we don’t have to recalculate the hash for every m characters of string.