hash tables Flashcards

1
Q

what is a hash table

A

data structure where items are stored in a location determined by their content - content based inde

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is every hash table fundamentally composed of?

A

array of items

hash function: that converts item to an index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is a hash function

A

is a mapping from an item to an integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the requirements for a good hash function?

A

fast compute
deterministic
spread keys evenly in hash table - no large gaps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a perfect has function

A

maps an every distinct key onto a distinct integer

minimal perfect hash function has no holes in the array

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how are collisions addressed?

A

open addressing - linear and quadratic probing

closed addressing - chaining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

explain linear probing insert

A

generate hash code = h

while A[h] contains a key A[h] - {key, value}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

explain linear probing find

A
generate hash code = h
while A[h] contains a key:
-- if A[h]{key} == key  return value
-- h = (h+1)%table size
return not found
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

issues with linear probing

A
  • primary clustering - multiple adjacent items and slow performance
  • uneven gaps in the table
  • full table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what to do when the table becomes full?

A

throw an error
create new and larger table
automatically make a larger array and then rehash every thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the load factor

A

the proportion of the table that is fill (λ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

probability of a cell being empty

A

1 - load factor (lambda)

1-λ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

average number of cells to be examined for insert

A

T(λ) = (1+1/(1-λ)^2)/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

average number of cells to be examined for successful find

A

T(λ) = (1+1/(1-λ))/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

quadratic probing insert

A

generate hash code h = hash(key), i =1
while a[h] contains a key
h = h+i*i%table size
a[h] = key, value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

quadratic probing find

A
generate hash code h = hash(key), i =1
while a[h] contains a key
-- if A[h]{key} == key  return value
--- h = h+i*i%table size
--- i++

return not found

17
Q

basic idea of collision resolution by chaining

A

use a table of pointers/references

each new item must be added to a linked list at that position in the table

18
Q

insertion algorithm for chaining

A

generate hash code h = hash(key), p = new Node
p.data = {key, value}; p.next = A[h]
A[h] = p

19
Q

find algorithm for chaining

A

generate hash code h = hash(key), p = A[h]
while p != null and p{key} != key
— p = p.next
return p

20
Q

analyze the chaining technique

A

n items and m table size - even distribution of hash values and use of all possible hash values
on ave each linked list is of size n/m
average search time - 1/2 n/m

21
Q

complexity analysis of chaining

A
best - O(n/m)
ave - 1/2 n/m
worst - O(n)
m - tablesize
n - number of items

All dependent on choice of M relative to n

22
Q

handling deletions open addressing vs chaining

A

open addressing:

  • deletion not easily possible
  • add a flag to mark deleted items - skip deleted items during search, unmark and over write deleted items on insert

chaining:
- use linked list deletions

23
Q

other variations to chaining

A

double hashing - if there is a collision because of due to secondary clustering use a second hash function when there is a collision
h = H1 +H2 - where H2 is the rehashing function
a different sequence is checked for each key

chaining using BST or other data structures

24
Q

what is secondary clustering?

A

where a key generates the same sequence of locations to check