Hash Tables Flashcards

1
Q

What is a dictionary?

A

A data structure that stores data by mapping keys to values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the methods that are implemented in dictionaries?

A
insert
remove
find
size
isEmpty
keys
values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe some possible implications of design decisions with dictionaries

A

Insert - overwrite existing keys?

Find - fail or return null for missing key?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three possible implementations of associative arrays (dictionaries)?

A

List
Array
Binary search tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe some features of list implementation of a dictionary?

A

Linear search complexity looking for keys

Simple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe some features of array implementation of a dictionary?

A

Same list of key/value pairs
Fixed size
O(n) search/add/remove

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe some features of binary search tree implementation of a dictionary?

A

O(log n) search, addition and removal

Need an order on keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do hash tables provide?

A

Key-index mapping in a simple and uniform way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe a hash table

A

Data structure that implements a dictionary, by using a hash function to compute an index into an array of buckets from which the desired value can be found

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a hash function?

A

Takes a key and produces an index/hash code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the two stages in a hash function

A
  • Hash key to its hash code l using the hash function h(k)

- Use the hash code as an index into a bucket containing the value associated with the key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a collision?

A

Two different keys will map to the same hash code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the has code further reduced to try and avoid collisions?

A

Using modulo arithmetic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give some desirable properties of hash functions

A

Gives two far apart hash codes for values close together
Two keys are unlikely to have the same hash code
Maps the keys uniformly across the range of hash codes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are two common methods of compression mapping?

A
Division method (modulo arithmetic)
Multiple and Divide
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When do compression mapping methods work best?

A

When m is prime

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Describe the Multiple and Divide (MAD) method for compression mapping

A

c(l) = (al + b)mod m for some a and b

18
Q

What is a cryptographic hash function?

A

Hash function that given a hash code l, it is very hard to generate a k such that h(k) = l
Also known as non-invertible function

19
Q

What are cryptographic hash functions used for?

A

Tamper proof files

20
Q

Pros of indexed storage classes

A

Very fast O(1) random access if you have an index, otherwise O(n)

21
Q

Disadvantages of indexed storage classes

A

Generally not expandable without copying

22
Q

Pros of sequential storage classes

A

Generally fast to expand by adding head or tail

23
Q

Disadvantage of sequential storage classes

A

Everything built by traversal from end(s) so generally O(n)

24
Q

Pros of associative storage classes

A

Use almost any value as key

Can address any key randomly

25
Q

Disadvantages of associative storage classes

A

Dynamic memory management

26
Q

How does storage get allocated in memory?

A

A pool of memory (the heap)
Structured as a block chain
Free and used blocks

27
Q

Consequences of the setup of memory allocation

A

Dynamic storage is easy to use but expensive to manage
Allocating lots of small objects can cause heap fragmentation
Garbage collection is relatively expensive, so doing it a lot can cause performance problems

28
Q

What is heap fragmentation?

A

Most of the memory is allocated in a large number of chunks, leaving a good percentage of your total memory un-allocated but un-usable for lots of circumstances

29
Q

What is the key performance measure of a hash table?

A

The load factor

30
Q

What is the load factor of a hash table/

A

Probability of a collision happening

31
Q

What is open addressing?

A

Use hashing to find a candidate bucket

Have a strategy to re map colliding keys and to find them again afterwards

32
Q

What is linear probing?

A

We hash the bucket and then step along the array until we find the key we are looking for

33
Q

What is quadratic probing?

A

Rather than linearily stepping through the array, use quadratic stepping where each jump is quadratically bigger

34
Q

What is secondary hashing?

A

Use a second hash function to pick the step, if we still get a collision, revert to linear probing

35
Q

What does open addressing allow?

A

Fixed memory footprint and management of collisions

36
Q

What is separate chaining?

A

Bucket is a list of key/value pairs

Hash to find the bucket then add it to the list

37
Q

Benefits of separate chaining

A

Unlimited storage

No worries about collisions, since they are handled automatically

38
Q

What is the worst case for separate chaining?

A

O(n)

39
Q

Average case of separate chaining?

A

O(n/m)

40
Q

What is extendible hashing?

A

Treats the hash code as a bit string that’s expanded piecemeal

41
Q

What happens in extendible hashing when a collision occurs?

A

Bucket is split, consuming another bit of the hash code

Repeat until the hash codes differ

42
Q

What are infinite hash codes?

A

Use the key as a seed to a pseudo-random number generator

Call the number generator repeatedly to get new bits as required, indefinitely