Mod 8 Flashcards by Dominic Fanta

When is it optimal to use a map data type?

When insertion, lookup, and removal are the **only ** operations we need, we can use the map data type.

How well did you know this?

Not at all

Perfectly

What is a hash table?

A hash table is similar to an array, where the key is transformed into an index through a hash function. A hash function is a function that takes values of some type (e.g. string, struct, double, etc.) and maps them to an integer index value. We can then use this value to both store and retrieve data from an actual array.

How well did you know this?

Not at all

Perfectly

What are the two steps to compute an index in a hash function?

hash = hash_function(key)

This is finding the hash.

index = hash % array_size

This is ensuring the hash fits within the array size. It is always better to use a prime number for array size as remainders of prime numbers have more uniformity and hence there will be less collisions.

How well did you know this?

Not at all

Perfectly

What are the three important properties of a hash function?

Determinism – a given input should always map to the same hash value.
Uniformity– the inputs should be mapped as evenly as possible over the output range. A non-uniform function can result in many collisions, where multiple elements are hashed to the same array index. We’ll look more at this later.
Speed – the function should have a low computational burden.

How well did you know this?

Not at all

Perfectly

What is DJB2?

A widely-used hash function:

def hash_djb2(s):
hash = 5381
for x in s:
# hash * 33 + c (bitshift left 5 places = * 32)
hash = ( (hash &laquo_space;5) + hash ) + ord(x)
return hash & 0xFFFFFFFF

How well did you know this?

Not at all

Perfectly

How is a minimally perfect hash function different from a perfect hash function?

Both perfect and minimally perfect hash functions map all keys to a distinct integer.

Minimally perfect hash functions take this one step further. They map N keys to exactly inters 0 to N-1, with each key getting precisely one value. Hence the output of a minimally perfect function is different in that the distinct integers are consecutive.

How well did you know this?

Not at all

Perfectly

How can we handle hash table collisions without probing?

We can eliminate collisions entirely if we allow multiple keys to share the same table entry (i.e., array index). To accommodate multiple keys, linked lists can be used to store the individual keys that map to the same entry. The linked lists are commonly referred to as buckets or chains, and this technique of collision resolution is known as chaining.

How well did you know this?

Not at all

Perfectly

What is the load factor of a hash table?

The load factor of a hash table is the average number of elements in each bucket:

𝝺=n/m

n is the total number of elements stored in the table
m is the number of buckets

How well did you know this?

Not at all

Perfectly

Can the load factor be greater than 1 in a hash table?

Only in a chained hash table, where multiple items can go in one bucket.

How well did you know this?

Not at all

Perfectly

Ina chained table, what is the average number of links traversed for searches?

For a linked list-based chained table, the average number of links traversed for successful searches is 𝝺 / 2. For unsuccessful searches, the average number of links traversed is equal to 𝝺.

How well did you know this?

Not at all

Perfectly

What is the average-case complexity of a linked list-based chained hash table?

What about worst-case?

Assuming good distribution:

The average case for all operations is O(𝝺).
If the number of buckets is adjusted according to the load factor, then the number of elements is a constant factor of the number of buckets i.e.:

𝝺 = n/m = O(m)/m = O(1).

The worst-case complexity is O(n), since all of the elements might end up in the same bucket.

How well did you know this?

Not at all

Perfectly

What is probing?

Probing occurs when the index from a hash function is already filled in a hash table with open addressing. Hence a new index must be calculated by probing until an empty index is found.

How well did you know this?

Not at all

Perfectly

Is linear probing good?

Linear probing: i = iinitial + j (where j = 1, 2, 3, …)

Linear probing can be problematic due to clustering. As a cluster becomes bigger, collisions become more likely and it takes longer and longer to find new empty spaces. Other probing methods may save time by looking further ahead and wrapping to the beginning once they reach the end.

How well did you know this?

Not at all

Perfectly

What is quadratic probing? Does it have any drawbacks?

Quadratic probing: i = ( iinitial + j2) % m (where, j = 1, 2, 3, …)

Quadratic probing can be problematic if there are empty spaces in an array that cannot be returned by the quadratic probing function.

How well did you know this?

Not at all

Perfectly

What is double hashing?

A form of probing that increases by the hash itself. It helps to reduce clustering (which bloats insertion times):

i = ( iinitial + j * h2(key) ) % m (where, j = 1, 2, 3, …)

How well did you know this?

Not at all

Perfectly

What is a tombstone?

Study These Flashcards

When an element is removed, we insert the tombstone value.

This prevents removals from interfering with probing for earlier indexes.

The tombstone value can be replaced when adding a new entry.

What should be done when a hash table is getting close to full?

Study These Flashcards

Just as a dynamic array is doubled in size when necessary, a common solution to a full hash table is to move all values into a new and larger table when the load factor becomes greater than some threshold, such as 0.75. A new table is created, and every entry in the old table is rehashed, this time dividing by the new table size to find the correct index to use in the new table.

What is the probability that the first probe (index for insertion) is successful?

Study These Flashcards

p = (m−n)/m

There are m total slots and n filled slots, so m − n open spots.

What is the probability that the second probe (index for insertion) is successful?

Study These Flashcards

If the first probe fails, the probability that the second probe succeeds is (m−n)/(m−1).

There are still m − n remaining open slots, but now we only have a total of m − 1 slots to look at, since we have examined one already.

Why do hash tables have O(1) operations on average?

Study These Flashcards

For each probe, the probability of success is at least p because (m−n)/(m−c) >= (m−n)/m = p.

the expected number of probes until success is:
1/p = 1/((m−n)/m) = 1/(1−n/m) = 1/(1−𝝺)

Thus, the expected number of probes for any given operation is O(1/(1−𝝺).

If we limit the load factor to a constant and reasonably small number, our operations will be O(1) on average.

In chaining, which data structure is appropriate?

Study These Flashcards

Singly linked list

Does every key in a hash table need to be unique?

Study These Flashcards

Yes, that’s the point of storing key/value pairs.

Does every key in a hash table need to hash to a unique value?

Study These Flashcards

No, but we should use a hash function with as few collisions as possible.

Does hash table performance increase or decrease as the number of buckets increases?

Study These Flashcards

It should increase. An increase in the number of buckets will more evenly distribute the elements within the indices of the hash table. Also, recall that accessing a bucket in a hash table runs in O(1) time, since it simply needs to run the hash function on a key to find out which bucket it belongs in. However, unless it is a perfect hash function without any collisions (multiple elements stored at one index), you will still need to traverse all of the elements within that bucket for certain methods like “contains”. This is where having more buckets and a smaller table load factor will increase performance since more buckets generally mean fewer elements stored at any one index.

In open addressing, if the first probe fails, is the probability that the second probe succeeds (m-n)/(n-1)?

No. If the first probe fails, the probability that the second probe succeeds is** (m-n)/(m-1). **

What is the worst-case time complexity to retrieve a value from a hash table with chaining?

In the worst case scenario, all keys have hashed to the same bucket and, therefore, all keys are chained together in a linked list. Further, the element you're searching for may be at the end of the linked list, meaning you have to traverse the full list from head to tail, which is **O(n).**

Mod 8 Flashcards

(26 cards)