Hashing (ALL) Flashcards

1
Q

What is hashing?

A

Hashing is a dictionary data structure for the Table ADT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are some desirable properties of a hash function?

A
  • It should be computable quickly (constant time).
  • if keys are drawn uniformly at random, then the hashed values should be uniformly distributed.
  • keys that are “close” should have their hash values “spread out”.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the collision resolution policies?

A
  • Open addressing
  • Chaining
  • Robin Hood hashing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is open addressing?

A

Open addressing uses no extra space - every element is stored in the hash table. If it gets overfull, we can reallocate space and rehash.

When a collision occurs in open addressing, we can

  • probe nearby for a free position (linear probing (easy to implement but leads to clustering), quadratic probing);
  • go to a “random” position by using a second-level hash function (double hashing);
  • try a position given by a second, third, . . . hash function (this plus LCFS gives cuckoo hashing).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is chaining?

A

Chaining uses an “overflow” list for each element in the hash table

  • Elements that hash to the same slot are placed in a list. The slot contains a pointer to the head of this list.
  • Insertion can then be done in constant time.
  • A drawback is the additional space overhead.
  • The distribution of sizes of lists turns out to be very uneven.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Insertion has the same cost as an unsuccessful search, provided the table is not full

true or false?

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The average cost of a successful search is the average of all insertions needed to construct the table from empty

true or false?

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the simple uniform hashing model?

A

That is, each of the n keys is equally likely to hash into any of the m slots. So we are considering a “balls in bins” model

If n is much smaller than m, collisions will be few and most slots will be empty. If n is much larger than m, collisions will be many and no slots will be empty. The most interesting behaviour is when m and n are of comparable size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is load factor, λ?

A

n/m

n = keys

m = slots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Balls in Bins

When do we expect the first collision?

A

E(m) ≈ sqrt(πm/2) + 2/3.

So collisions happen even in fairly sparse tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Balls in Bins

When do we expect all boxes to be nonempty?

A

after about m log m balls. It takes a long time to use all lists when chaining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Balls in Bins

What proportion of boxes are expected to be empty when n is Θ(m)?

A

e−λ .

Many of the lists are just wasted space even for pretty full tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Balls in Bins

When m is Θ(n), what is the expected maximum number of balls in a box?

A

about (log n)/(log log n). Some of the lists may be fairly long

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the worst case for searching in chaining?

A

Θ(n), since there may be only one chain with all the keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is e average cost for unsuccessful search in chaining?

A

the average list length, namely λ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the average cost for successful search in chaining?

A

The average cost for successful search is then

(n + 1)/2m ≈ λ/2

17
Q

Provided load factor is kept bounded what is the runtime on average for basic operations?

A

constant time

O(1)

18
Q

Linear probing average insertion cost?

A

Linear probing is not well described by the uniform hashing model (because of the clustering). A more detailed analysis can be done.

The average insertion cost is Θ(1 + 1/(1 − λ) 2 ).

19
Q

Uniform hashing average insertion cost?

A

Θ(1/(1 − λ)) as m → ∞.

20
Q

Uniform hashing unsuccessful search?

A

Θ(1/(1 − λ))

21
Q

Deletion in hash tables

A

If we use chaining, deletion just involves deleting an element from the relevant linked list. However, we must still find it in order to delete it

open addressing, deletion can be trickier. One method is to mark elements as deleted, but not delete them. However this can cause space wastage

If we simply delete an item, then we may no longer be able to find other items that had previously collided with it

22
Q

Effect of hashing on runtime of binary search and search trees?

A

O(1) dictionary operations instead of the O(lg n) needed for binary search (static tables) and/or balanced search trees (dynamic tables)

23
Q

What happens the larger the load factor gets?

A

larger λ is the more chance of collisions.

24
Q

OALP Example

A
25
Q

OADH Example

A
26
Q

What are the main desired properties for a good hashing scheme?

A
  • The hash locations of the keys S are spread out to minimize collisions (for lookup, insert, deletes,…).
  • Use a relatively small table size m = O(n) that doesn’t affect performance.
  • The function h is fast to compute (polynomial in the size of a key x ∈ S, but constant time with respect to n).

NOTE: If we use separate chaining for collisions, the time needed for processing key x is O(|A[h(x)]|) where |A[i]| denotes the length of the chain/list at index i