Hash maps Flashcards

Question

Why is cuckoo hashing 20-30% slower than linear probing?

Answer 1

The reason is that cuckoo hashing often causes two cache misses per search, to check the two locations where a key might be stored, while linear probing usually causes only one cache miss per search. However, because of its worst case guarantees on search time, cuckoo hashing can still be valuable when real-time response rates are required.

Answer 2

Facebook Xxhash(2012) and Google FarmHash (2014)

Answer 3

Chained Hashing, extendible hashing, linear hashing

Answer 4

Usually not hash tables. - lack of ordering in widely used hash schemes - lack of locality of reference = more disk seeks (I assume if you are accessing data over a wide range) Probably B+Trees is what you want

Answer 5

A perfect hash function is one that maps N keys to the range [1,R] without having any collisions. A minimal perfect hash function has a range of [1,N]. We say that the hash is minimal because it outputs the minimum range possible. The hash is perfect because we do not have to resolve any collisions.

Answer 6

0.75 and 16

Answer 7

A minimal perfect hash function F is order preserving if keys are given in some order a1, a2, ..., an and for any keys aj and ak, j < k implies F(aj) < F(ak).

Answer 8

Space and time that is on average linear in terms of the number of keys involved

Answer 9

All the methods of Hashtable are synchronized, so only one thread can execute any of them at a time. When using non-synchronized constructs like HashMap, you must build thread-safety features in your code to prevent consistency errors. HadhMap is faster

Answer 10

retrieve the position of a key in a given list of keys

Answer 11

Removal is done simply by clearing the bucket storing the key. Again, worst-case complexity is O(1). As opposed to other open addressing schemes there are no chains and no need to use deleted markings or so called tombstones (cf. section on removal in Hash Tables: Open Addressing)

Answer 12

With double hashing, another hash function, h2 is used to determine the size of the steps in the search sequence. If h2(key) = j the search sequence starting in bucket i proceeds as follows: i + 1 × j i + 2 × j i + 3 × j (If j happens to evaluate to a multiple of the array length, 1 is used instead.) This approach is worse than the previous two regarding memory locality and cache performance, but avoids both primary and secondary clustering.

Answer 13

Well linear probing is (on average??) 20-30% faster than cuckoo hashing (since cuckoo hashing has on average 2 cache misses as opposed to linear probing’s 1), but cuckoo hashing has worst case constant response time so I guess it would be more suitable for when worst case fast response time is required (aka guaranteed real-time response)

Answer 14

The load factor represents at what level the HashMap capacity should be doubled

Answer 15

the number of buckets in the HashMap

Answer 16

to resolve collisions by using two hash functions instead of only one. This provides two possible locations in the hash table for each key

Answer 17

will have some clusters of logarithmic length, and will take logarithmic time to search for the keys within that cluster

Answer 18

by using a higher-quality hash function, or by using a hashing method such as double hashing that is less susceptible to clustering

Answer 19

is one of two major failure modes of open addressing based hash tables, especially those using linear probing. It occurs after a hash collision causes two of the records in the hash table to hash to the same position, and causes one of the records to be moved to the next location in its probe sequence. Once this happens, the cluster formed by this pair of records is more likely to grow by the addition of even more colliding records, regardless of whether the new records hash to the same location as the first two. This phenomenon causes searches for keys within the cluster to be longer.[1]

Answer 20

Closed hashing

Answer 21

uses one hash value as an index into the table and then repeatedly steps forward an interval until the desired value is located, an empty location is reached, or the entire table has been searched; but this interval is set by a second, independent hash function

Answer 22

1/|T|^2 where |T| is the number of buckets <<< I don’t understand why though

Answer 23

1/(1-a) where a is the load factor and where the hash table has a fixed lod factor. So a load factor of .8 vs .5 would mean an expected 5 probes instead of 2

Answer 24

The search interval (for when the bucket the hash yields is already occupied) depends on the data, so that values mapping to the same location have different bucket sequences; this minimizes repeated collisions and the effects of clustering.

Answer 25

Linear time (i assume this is assuming an imperfect hash function, which id imagine pretty much all are)

Answer 26

75%. Eventually rehashing to a larger size will be necessary as is with all other open addressing schemes

Answer 27

each key has two locations it can be stored in (aka teo possiblr buckets for each key) - h1(k) and h2(k). If we try to insert k into bucket h1(k) but h1(k) is already oxcupied by k2, we move k2 to bucket h2(k2) ( but of course first move the keyvalue pair k3 at h2(k2) to h2(k3) and so on (unless bucket h2(k2) were empty). If there are too many iterstions of this “kicking the cuckoo bird out of the nest” process, then the hash functions are rehashed (i think the article simply means rehashing the hash functions, not actually expanding the table. Could be both though. Not sure. Should read a proper trxtbook and not just wikipedia)

Answer 28

in expected constant time,[1] even considering the possibility of having to rebuild the table, as long as the number of keys is kept below half of the capacity of the hash table, i.e., the load factor is below 50

Answer 29

Cuckoo hashing insertion time complexity (acronym we made to lake flashcards more easily)

Answer 30

Cuckoo hashing. This also allows for dynamic updates to the set. (I think the nature of this question is that cuckoo hashing, like perfect hashing, has a worst case constant lookup time)

Answer 31

an injective function

Answer 32

A function where no two input values yield the same output value

Hash maps Flashcards

(59 cards)