Ch 8: Hash Tables Flashcards
what is a hash table?
a data structure that stores unordered items by mapping (hashing) each item to a location in an array (or vector)
what is the main advantage of hash tables?
an item may require only O(1) insert, remove and search
what is a key?
a value used to map to an index
what is a bucket?
each hash table array element
what is a hash function?
computes a bucket index from the item’s key
what is collision?
occurs when an item is inserted into a hash table maps to the same bucket as an existing item in the hash table
what is chaining?
a collision resolution technique where each bucket has a list of items
what is open addressing?
a collision resolution technique where collisions are resolved by looking for an empty bucket elsewhere in the table
what is linear probing?
handles a collision by starting at the key’s mapped bucket, then linearly searches subsequent buckets until an empty bucket is found
what are two types of buckets? define them
empty-since-start: has been empty since the hash table was created
empty-after-removal: had an item removed that caused the bucket to now be empty
why does search stop at an empty-since-start but not an empty-after-removal?
stops because if previously inserting the item had reached that bucket, then it would have put it there. but since it has been empty since the start, it must not have been inserted
explain removal with linear probing
- determine the initial bucket
- probe each bucket until a matching item is found, an empty-since-start bucket is found, or all buckets have been probed
what is quadratic probing?
starting at the key’s mapped bucket, and then quadratically searches subsequent buckets until an empty bucket is found
what is the hash table function for quadratic probing
(H + c1 *i + c2&(i^2)) mod (tablesize)
H: item’s initial mapped bucket
c1 & c2: programmer definined constants for quadratic probing
i: increment
explain insert for quadratic probing
- use formula, starting with i = 0 to find bucket
- insert if bucket is empty
- otherwise, increment i and use the formula to find a new bucket, repeat
what is a probing sequence?
iterating through sequential i values to obtain the desired table values
when is a hash table fast?
if it minimizes collision
what are perfect hash functions?
maps items to buckets with no collisions
when can a perfect hash function be created?
if the number of items and all possible key item keys are known before hand
what do good hash functions do?
uniformly distribute items into buckets
chaining and a good hash result in what?
short bucket lists so faster inserts, searches and removes
what is the average & worst case operation run times on good hash functions?
average: O(1)
worst: O(n)
what is a modulo hash?
uses remainder from division of key by hash table size
what is a midsquare hash?
squares the key, extracts R digits from the results middle and returns the remainder of the middle digits divide by hash table size
what is a multiplicative string hash?
repeatedly multiplies the hash value and adds the ASCII value of each character in the string
what is a direct hash function?
uses the item’s keys as the bucket index
what is a direct access table?
a hash table with a direct hash function
what is the advantage of a direct access table?
no collision
what are the two main limitations of a direct access table?
- all keys must be non-negative integers, but for some applications, keys may be negative
- the hash table’s size equals the largest key values plus 1, which may be very large
what is double hashing?
an open-addressing collision resolution technique that uses 2 different hash functions to compute bucket indices
what is the key index for double hashing?
(h1(key) + i*h2(key))mod(tablesize)
what size is a table resized to?
the next prime number >= N*2
what is the run time complexity of resizing and why?
O(n) because the old array is reindexed to new array
what is a load factor?
if of items in the hash table divided by the # of buckets
when does resizing often occur?
when 1+ values exceed a certain threshold:
- load factor
- when using open-addressing, the # of collisions during an insertion
- when using chaining, the size of bucket’s linked list