Hashing Flashcards
1
Q
Hash Table
A
- Data structure used to implement sets and dictionaries
- Implemented with an array (often an array of linked lists) and a hash function
- Good performance time
- But ineffective for ordering/sorting
2
Q
Collisions
A
- Collisions occur when multiple elements map to the same index
- 2 ways to resolve collissions
- Open addressing
- Separate Chaining
3
Q
Clustering
A
- Clustering is a situation in which collisions cause elements to group around the same index
- Clustering is the result of open addressing
- Clustering slows down performance speed
- The larger the cluster, the more likely more collisions will occur
4
Q
Hash Functions
A
- Function used to compute index of item in hash table
Often consists of 2 things
- Hash function generates a new #
- Then you modulo that # by array_size to compute index
5 things make a good hash function
- Use all data being hashed
- Use only data being hashed
- Should be deterministic
- Map similar keys to different values
- Hash values should be distributed evenly
5
Q
Load Factor
A
- Ratio of elements in hash table to number of buckets
- N/M
- Where M is the size of the array
- As load factor grows, hash table becomes slower
- With open addressing, it’s the percentage of occupied cells
- Good load factor is between 1/8 and 1/2
- With separate chaining, it’s the average length of the linked lists
6
Q
Hashing
Folding Method
A
- A type of hash function for integers
Steps
- Break #’s into equal chunks (groups of 2 or 3, etc.)
- Sum chunks to produce hash value
- Mod the result
7
Q
Hashing
Multiplication Method
A
- A type of hash function for numbers
- Uses the following formula
h(k) = floor[m(kA mod 1)]
- m ⇒ size of the array
- k ⇒ key
- A ⇒ number between 0 and 1
- A good choice is the golden ratio
- (root(5) - 1)/2
8
Q
Hashing
Strings
A
- A common approach to hashing strings is to convert each char to a # and sum
Steps
- Loop through char, adding up ascii values
- Mod the total
9
Q
Open Addressing
A
- An implementation of hash functions that involves “probing” for free slots when collisions occur
- If slot is occupies, insert into next free slot
- If you reach end of array, loop through to beginning and search from top
- This approach can cause clustering
- Also, hash table is of fixed size, so must resize (and rehash) periodically
- 2 versions
- Linear probing
- Quadratic probing
10
Q
Linear Probing
A
- A form of open addressing
- Inervals between probes is a fixed constant (usually 1)
- Uses formula
h(x) + c
- Where size of interval c >= 1
- Using larger intervals reduces clustering
- Ex: if index 3 is filled, and interval == 2, then check 5, 7, 9, etc.
11
Q
Quadratic Probing
A
- A form of open addressing
- Intervals between probes increase quadratically
- This approach decreases clustering
- Uses formula:
i(k) + c1p+ c2p2
- Where k is the key, and i(k) is the hash index, and p the number of probes
- Note you can include a linear component, but c1 is often 0
- Ex: if index 1 is filled, and c1 = 0 and c2 = 1, then check slots 2, 5, 10, 17, etc.
12
Q
Separate Chaining
A
- An implementation of hash tables that uses an array of linked lists
- Each index is the head of its own list
- Prevents clustering
13
Q
Hash Table
Complexity
A
- Applies to both linear probing and separate chaining
- Applies to all operations (insertion, deletion, and search (get))
Complexity
Time
Average/Amortized: O(1) ♦ Worst: O(n)
Space
O(1)
*Worst case delete/search from looping through elements that hash to same value, worst case insert from rehashing operation (reallocating array)