Maps, Hash Tables & Binary Search Trees Flashcards
What is a Map ADT?
A map models a collection of key-value entries that is searchable ‘by the key’
The main operations of a map are for searching, inserting and deleting items
Multiple entries with the same key are not allowed
What is the difference between a Map and Priority queue?
Map - access any key
Priority Queue - access only the minimum key
How does a map work as a doubly linked list?
Store the items of the map in a list S, in arbitrary order, as well as a size counter, so that getSize is O(1)
What is the performance of a list-based Map?
put() - takes O(n) as you have to check if the key occurs in the map already
get and remove take O(n) time since in the worst case, you traverse the whole list
What is a Hash Table and what are its time complexities for looking up keys, inserting and deleting them?
A CDT which is suitable for implementing maps
Convert each key into an index into a (big) array
Look-up, insertion and deletion of keys usually runs in O(1) time
What is a hash function?
Maps keys of a given type to integers in a fixed interval [0, N-1]
Example - H(k) = k mod N
The integer h(k) is called the hash value of key k
What does a hash table consist of?
A hash function (h)
Array (called table) of size N
What is a compression function?
A compression function is the 2nd part of a hash function. The goal is to be applied directly on top of the hash function, to ensure the keys gets ‘dispersed’ in an ‘apparently random’ way.
What is Separate Chaining?
Let each cell in the table point to a linked list of entries that map there e.g. if something collides with an entry that is already there, then it just gets added to the linked list
What are the problems with Separate Chaining?
Requires additional memory outside of the table
What is open addressing and what are its problems?
The colliding item is placed in a different cell of the table
Linear probing handles collisions by placing the next colliding item in the next available table cell
Each table cell inspected is referred to as a ‘probe’
Disadvantage - colliding items lump together, causing future collisions to cause a longer sequence of probes
How do you delete an element from a hash table safely?
Use ‘lazy deletion’ - don’t mark the entry as a blank, but as a ‘deleted’ cell and fix the entries later
What is double hashing?
Double hashing uses a secondary hash function d(k), which handles collisions by placing an item in the first available cell of the series.
For reference, linear probing is d(k) = 1 i.e. take the next available spot. If it doesn’t exist, then move yet 1 more space forward.
The table size N must be a prime to allow probing of all the cells
What is the performance of hashing?
In the worst case, searches, insertion and removals on a hash table take O(n) time. Otherwise, normally O(1)
Reference - worst case occurs only when all keys collide
What does ‘load factor’ mean?
This affects the performance of a hash table. It is a measure of how ‘full’ the array for the hash table is. In Java, the highest this is allowed to go is 75%, and then it gets re-hashed.