Hashing (ALL) Flashcards

Question 1

Q

What is hashing?

Answer

A

Hashing is a dictionary data structure for the Table ADT

Question 2

Q

what are some desirable properties of a hash function?

Answer

A

It should be computable quickly (constant time).
if keys are drawn uniformly at random, then the hashed values should be uniformly distributed.
keys that are “close” should have their hash values “spread out”.

Question 3

Q

What are the collision resolution policies?

Answer

A

Open addressing
Chaining
Robin Hood hashing

Question 4

Q

What is open addressing?

Answer

A

Open addressing uses no extra space - every element is stored in the hash table. If it gets overfull, we can reallocate space and rehash.

When a collision occurs in open addressing, we can

probe nearby for a free position (linear probing (easy to implement but leads to clustering), quadratic probing);
go to a “random” position by using a second-level hash function (double hashing);
try a position given by a second, third, . . . hash function (this plus LCFS gives cuckoo hashing).

Question 5

Q

What is chaining?

Answer

A

Chaining uses an “overflow” list for each element in the hash table

Elements that hash to the same slot are placed in a list. The slot contains a pointer to the head of this list.
Insertion can then be done in constant time.
A drawback is the additional space overhead.
The distribution of sizes of lists turns out to be very uneven.

Question 6

Q

Insertion has the same cost as an unsuccessful search, provided the table is not full

true or false?

Question 7

Q

The average cost of a successful search is the average of all insertions needed to construct the table from empty

true or false?

Question 8

Q

What is the simple uniform hashing model?

Answer

A

That is, each of the n keys is equally likely to hash into any of the m slots. So we are considering a “balls in bins” model

If n is much smaller than m, collisions will be few and most slots will be empty. If n is much larger than m, collisions will be many and no slots will be empty. The most interesting behaviour is when m and n are of comparable size

Question 9

Q

What is load factor, λ?

Answer

A

n/m

n = keys

m = slots

Question 10

Q

Balls in Bins

When do we expect the first collision?

Answer

A

E(m) ≈ sqrt(πm/2) + 2/3.

So collisions happen even in fairly sparse tables

Question 11

Q

Balls in Bins

When do we expect all boxes to be nonempty?

Answer

A

after about m log m balls. It takes a long time to use all lists when chaining

Question 12

Q

Balls in Bins

What proportion of boxes are expected to be empty when n is Θ(m)?

Answer

A

e^−λ .

Many of the lists are just wasted space even for pretty full tables

Question 13

Q

Balls in Bins

When m is Θ(n), what is the expected maximum number of balls in a box?

Answer

A

about (log n)/(log log n). Some of the lists may be fairly long

Question 14

Q

What is the worst case for searching in chaining?

Answer

A

Θ(n), since there may be only one chain with all the keys

Question 15

Q

What is e average cost for unsuccessful search in chaining?

Answer

A

the average list length, namely λ.

Question 16

Q

What is the average cost for successful search in chaining?

Answer

A

The average cost for successful search is then

(n + 1)/2m ≈ λ/2

Question 17

Q

Provided load factor is kept bounded what is the runtime on average for basic operations?

Answer

A

constant time

O(1)

Question 18

Q

Linear probing average insertion cost?

Answer

A

Linear probing is not well described by the uniform hashing model (because of the clustering). A more detailed analysis can be done.

The average insertion cost is Θ(1 + 1/(1 − λ) 2 ).

Question 19

Q

Uniform hashing average insertion cost?

Answer

A

Θ(1/(1 − λ)) as m → ∞.

Question 20

Q

Uniform hashing unsuccessful search?

Answer

A

Θ(1/(1 − λ))

Question 21

Q

Deletion in hash tables

Answer

A

If we use chaining, deletion just involves deleting an element from the relevant linked list. However, we must still find it in order to delete it

open addressing, deletion can be trickier. One method is to mark elements as deleted, but not delete them. However this can cause space wastage

If we simply delete an item, then we may no longer be able to find other items that had previously collided with it

Question 22

Q

Effect of hashing on runtime of binary search and search trees?

Answer

A

O(1) dictionary operations instead of the O(lg n) needed for binary search (static tables) and/or balanced search trees (dynamic tables)

Question 23

Q

What happens the larger the load factor gets?

Answer

A

larger λ is the more chance of collisions.

Question 24

Q

OALP Example

Question 25

Q

OADH Example

Question 26

Q

What are the main desired properties for a good hashing scheme?

Answer

A

The hash locations of the keys S are spread out to minimize collisions (for lookup, insert, deletes,…).
Use a relatively small table size m = O(n) that doesn’t affect performance.
The function h is fast to compute (polynomial in the size of a key x ∈ S, but constant time with respect to n).

NOTE: If we use separate chaining for collisions, the time needed for processing key x is O(|A[h(x)]|) where |A[i]| denotes the length of the chain/list at index i