Hash Tables - Chapter 4 Flashcards
A blank is a data structure that stores unordered items by mapping (or hashing) each item to a location in an array (or vector).
hash table
A hash table’s main advantage is that searching (or inserting / removing) an item may require only blank, in contrast to O(N) for searching a list or to O(log N) for binary search.
O(1)
In a hash table, an item’s blank is the value used to map to an index.
key
For all items that might possibly be stored in the hash table, every key is ideally blank, so that the hash table’s algorithms can search for a specific item by that key.
unique
Each hash table array element is called a blank
bucket
A hash function computes a bucket index from the blank.
item’s key
A common hash function uses the blank, which computes the integer remainder when dividing two numbers.
modulo operator %
A hash table’s operations of insert, remove, and search each use the blank to determine an item’s bucket
hash function
The approach for a hash table algorithm determining whether a cell is blank depends on the implementation. For example, if items are simply non-negative integers, empty can be represented as -1.
empty
More commonly, items are each an object with multiple fields (name, age, etc.), in which case each hash table array element may be a pointer. Using pointers, empty can be represented as blank.
null
A blank occurs when an item being inserted into a hash table maps to the same bucket as an existing item in the hash table.
collision
Blank is a collision resolution technique where each bucket has a list of items (so bucket 5’s list would become 55, 75).
Chaining
Blank is a collision resolution technique where collisions are resolved by looking for an empty bucket elsewhere in the table (so 75 might be stored in bucket 6).
Open addressing
Blank handles hash table collisions by using a list for each bucket, where each list may store multiple items that map to the same bucket
Chaining
The blank operation first uses the item’s key to determine the bucket, and then inserts the item in that bucket’s list
insert
Searching also first determines the bucket, and then searches the bucket’s blank
list
A hash table with blank handles a collision by starting at the key’s mapped bucket, and then linearly searches subsequent buckets until an empty bucket is found.
linear probing
Actually, linear probing distinguishes two types of empty buckets
empty-since-start
empty-after-removal
An blank bucket has been empty since the hash table was created.
empty-since-start
An blank bucket had an item removed that caused the bucket to now be empty
empty-after-removal
Using linear probing, a hash table blank algorithm uses the item’s key to determine the initial bucket, linearly probes (or checks) each bucket, and inserts the item in the next empty bucket (the empty kind doesn’t matter).
insert
Using linear probing, a hash table blank algorithm uses the sought item’s key to determine the initial bucket. The algorithm probes each bucket until either a matching item is found, an empty-since-start bucket is found, or all buckets have been probed. If the item is found, the item is removed, and the bucket is marked empty-after-removal.
remove
Note that if the algorithm encounters an blank bucket, the algorithm keeps probing, because the sought item may have been placed in a subsequent bucket before this bucket’s item was removed.
empty-after-removal
A hash table with blank handles a collision by starting at the key’s mapped bucket, and then quadratically searches subsequent buckets until an empty bucket is found.
quadratic probing
If an item’s mapped bucket is H, the formula blank is used to determine the item’s index in the hash table.
(H + c1 * i + c2* i^2) mod (tablesize)
Each time an empty bucket is not found, i is incremented by 1. Iterating through sequential i values to obtain the desired table index is called the blank.
probing sequence
Blank is an open-addressing collision resolution technique that uses 2 different hash functions to compute bucket indices
Double hashing
Using hash functions h1 and h2, a key’s index in the table is computed with the formula
blank .
(h1(key) + i * h2(key)) mod (tablesize)
Inserting a key uses the formula, starting with i = 0, to repeatedly search hash table buckets until an empty bucket is found. Each time an empty bucket is not found, i is incremented by 1. Iterating through sequential i values to obtain the desired table index is called the blank.
probing sequence
A hash table blank operation increases the number of buckets, while preserving all existing items.
resize
A hash table with N buckets is commonly resized to the next prime number ≥ blank.
N * 2
A hash table resize operation s time complexity blank.
O(N)
A hash table’s blank is the number of items in the hash table divided by the number of buckets.
load factor
An implementation may choose to resize the hash table when one or more of the following values exceeds a certain threshold:
Load factor
When using open-addressing, the number of collisions during an insertion
When using chaining, the size of a bucket’s linked-list
A hash table is fast if the hash function minimizes blank.
collisions
A blank maps items to buckets with no collisions. A perfect hash function can be created if the number of items and all possible item keys are known beforehand. The runtime for insert, search, and remove is O(1) with a perfect hash function.
perfect hash function
A good hash function should uniformly distribute items into buckets. With chaining, a good hash function results in short bucket lists and thus fast inserts, searches, and removes. With linear probing, a good hash function will avoid hashing multiple items to consecutive buckets and thus minimize the average linear probing length to achieve fast inserts, searches, and removes. On average, a good hash function will achieve blank inserts, searches, and removes, but in the worst-case may require blank
O(1)
O(N)
A blank uses the remainder from division of the key by hash table size N.
modulo hash
A blank squares the key, extracts R digits from the result’s middle, and returns the remainder of the middle digits divided by hash table size N. Ex: For a hash table with 100 entries and a key of 453, the decimal (base 10) mid-square hash function computes 453 * 453 = 205209, and returns the middle two digits 52.
mid-square hash
The mid-square hash function is typically implemented using blank, and not decimal, because a binary implementation is faster. A decimal implementation requires converting the square of the key to a string, extracting a substring for the middle digits, and converting that substring to an integer. A binary implementation only requires a few shift and bitwise AND operations.
binary (base 2)
A binary mid-square hash function extracts the middle R bits, and returns the remainder of the middle bits divided by hash table size N, where R is greater than or equal to blank
Log2 N
A blank repeatedly multiplies the hash value and adds the ASCII (or Unicode) value of each character in the string.
multiplicative string hash
A multiplicative hash function for strings starts with a large initial value. For each character, the hash function blank the current hash value by a multiplier (often prime) and adds the character’s value. Finally, the function returns the remainder of the sum divided by the hash table size N.
multiplies
A blank uses the item’s key as the bucket index.
direct hash function
A hash table with a direct hash function is called a blank.
direct access table
Given a key, a direct access table blank returns the item at index key if the bucket is not empty, and returns null (indicating item not found) if empty.
search algorithm
A direct access table has the advantage of blank:
no collisions
However, a direct access table has two main limitations.
All keys must be non-negative integers, but for some applications keys may be negative.
The hash table’s size equals the largest key value plus 1, which may be very large.
Blank is a field of study focused on transmitting data securely.
Cryptography
Secure data transmission commonly starts with blank: alteration of data to hide the original meaning.
encryption
The counterpart to encryption is blank: reconstruction of original data from encrypted data.
decryption
A blank can be used to produce a hash value for data in contexts other than inserting the data into a hash table.
hash function
A hashing algorithm called blank produces a 128-bit hash value for any input data. The hash value cannot be used to reconstruct the original data, but can be used to help verify that data isn’t corrupt and hasn’t been altered.
MD5
A blank is a hash function designed specifically for cryptography. Such a function is commonly used for encrypting and decrypting data.
cryptographic hash function
A blank is a cryptographic hashing function that produces a hash value for a password. Databases for online services commonly store a user’s password hash as opposed to the actual password. When the user attempts a login, the supplied password is hashed, and the hash is compared against the database’s hash value. Because the passwords are not stored, if a database with password hashes is breached, attackers may still have a difficult time determining a user’s password.
password hashing function
Blank is a base class that declares methods for hashKey(), insert(), remove(), and search(). The hashKey() method returns a non-negative integer hash code for a key. Each of the insert(), remove(), and search() methods has only a pass statement and will be overridden in a derived class.
HashTable
HashTable’s blank method uses Python’s built-in hash() function. hash() may return a negative integer, but a non-negative integer is needed to compute a bucket index. So hashKey() returns the absolute value of the integer returned from hash().
hashKey()
lass HashTable:
# Returns a non-negative hash code for the specified key.
def hashKey(self, key):
return abs(hash(key))
# Inserts the specified key/value pair. If the key already exists, the # corresponding value is updated. If inserted or updated, True is returned. # If not inserted, then False is returned. def insert(self, key, value): pass # Searches for the specified key. If found, the key/value pair is removed # from the hash table and True is returned. If not found, False is returned. def remove(self, key): pass # Searches for the key, returning the corresponding value if found, None if # not found. def search(self, key): pass
HashTable base class.
The blank class represents an item with a key, a value, and a reference to the next item in the chain. ChainingHashTable derives from the HashTable class, and uses a list of ChainingHashTableItem objects for the table attribute. Initially each item in the table list is None, indicating an empty bucket.
ChainingHashTableItem
ChainingHashTable’s blank computes the key’s bucket index, then searches the bucket’s linked list for a matching key. If found, the corresponding value is updated. If not found, a new ChainingHashTableItem is appended to the linked list.
insert() method
ChainingHashTable’s blank computes the key’s bucket index, then searches the bucket’s linked list for a matching key. If found, the ChainingHashTableItem object with the key is removed from the bucket’s linked list.
remove() method
ChainingHashTable’s blank computes the key’s bucket index, then searches the bucket’s linked list for a matching key. If found, the key’s corresponding value is returned. If not found, None is returned.
search() method
class ChainingHashTableItem:
def __init__(self, itemKey, itemValue):
self.key = itemKey
self.value = itemValue
self.next = None
class ChainingHashTable(HashTable):
def __init__(self, initialCapacity = 11):
self.table = [None] * initialCapacity
# Inserts the specified key/value pair. If the key already exists, the # corresponding value is updated. If inserted or updated, True is returned. # If not inserted, then False is returned. def insert(self, key, value): # Hash the key to get the bucket index bucket_index = self.hashKey(key) % len(self.table) # Traverse the linked list, searching for the key. If the key exists in # an item, the value is replaced. Otherwise a new item is appended. item = self.table[bucket_index] previous = None while item != None: if key == item.key: item.value = value return True previous = item item = item.next # Append to the linked list if self.table[bucket_index] == None: self.table[bucket_index] = ChainingHashTableItem(key, value) else: previous.next = ChainingHashTableItem(key, value) return True # Searches for the specified key. If found, the key/value pair is removed # from the hash table and True is returned. If not found, False is returned. def remove(self, key): # Hash the key to get the bucket index bucket_index = self.hashKey(key) % len(self.table) # Search the bucket's linked list for the key item = self.table[bucket_index] previous = None while item != None: if key == item.key: if previous == None: # Remove linked list's first item self.table[bucket_index] = item.next else: previous.next = item.next return True previous = item item = item.next return False # key not found # Searches for the key, returning the corresponding value if found, None if # not found. def search(self, key): # Hash the key to get the bucket index bucket_index = self.hashKey(key) % len(self.table) # Search the bucket's linked list for the key item = self.table[bucket_index] while item != None: if key == item.key: return item.value item = item.next return None # key not found
ChainingHashTableItem and ChainingHashTable classes.
Some hash table implementations distinguish an insert operation from an blank. In such implementations, insert fails if the key already exists and update fails if the key doesn’t already exist. The implementation of insert() in this section serves both purposes, inserting if the key doesn’t exist and updating if the key does exist.
Update operation
Each bucket in an open addressing hash table stores a single item, rather than a linked list of items. So the OpenAddressingBucket class has only two members: blank and blank.
key and value
blank is a base class that derives from HashTable and adds the probe() method. Each derived class implements the probing sequence via the probe() method, allowing for the insert(), remove(), and search() methods to be implemented in the OpenAddressingHashTable class.
OpenAddressingHashTable