Hash tables Flashcards
What is hashing?
Mapping an object or number onto a range of integers
What is collision?
When multiple object hash to the same value
What are 4 necessary properties of a hash function?
- Fast, should take O(1)
- Hash function must be deterministic
- Equal objects should hash to the same slot
- If two objects are randomly chosen, there is a 1/2^{32} chance they have the same hash value
Give an example of a predetermined hash function.
Suppose you have a static member variable (available across all instances) to keep track of the hash value. Every time you create an object, the object gets the current hash value incremented by 1.
What is an issue with the predetermined hash function example?
The hash value assigned is not tied to the object itself, but to the construction process, i.e. if we save the object to a secondary memory (e.g. write to file or disk), when we reload the data, the constructor will run again and the object will be assigned a different hash value this time, even though it is the same object.
Using predetermined hash functions, each object gets a unique hash value. What is a scenario where this is not appropriate?
- When we have two strings with the same characters
- When we have two objects that are conceptually equal, e.g. two rationals 1/2 and 3/6
How to solve the issue of two logically equal rationals not hashing to the same value?
Divide both numerator and denominator by gcd to reduce the fraction to its simplest form. Then apply the hash function. One example is take the product of the numerator and the denominator.
Why is just adding the characters in a string a bad way to hash a string?
It takes a long runtime, hashes “ABC” to the same slot as “BCA” (ignores all ordering of words), poor distribution (long strings will always hash to bigger values and long strings might produce hash values that cluster near certain regions of the hash space)
Suggest a way to hash strings efficiently that also preserves ordering.
Represent the string as a polynomial p(x)=c_0x^{n-1}+c_1x^{n-2}+…+c_{n-1} where each coefficient is an individual character. Pick a prime number and let that be x to evaluate p(x). Use Horner’s rule to re-order the equation to compute p(x) more efficiently. The result is a single integer, which is the hash value.
Why is hashing strings this way good?
- Using a prime number ensures good distribution of hash values
- Order-sensitive, preserves information of the string
Why do we want to study asymptotic?
We want a way to analyze an algorithm and compare their performances regardless of the computer it is run on. It helps us learn about the scalability of algorithms, independent of hardware.
How can we think of f(n) ~ g(n) and f(n) < g(n)?
f(n)~g(n) means their performances are comparable for large values of n and improving hardware will affect both algorithms similarly. It is always possible to buy a faster computer so one algorithm will run faster than the other.
f(n)<g(n) means f(n) grows asymptotically slower than g(n). No matter how fast a computer becomes, f(n) will always run faster than g(n).
Is o(.) weaker than O(.)?
No, o(.) is stricter than O(.). Small-o requires that f(n) grows strictly slower than g(n) as n tends to infinity.
Big-o is an upper bound indicating that f(n) does not grow faster than g(n) for sufficiently large n.
Is small-omega stricter than big-omega?
For small-omega, f(n) grows strictly faster than g(n), while for big-omega, f(n) grows at least as fast as g(n), up to a constant factor.