Hash Tables Flashcards by Shane P

What is the name of the function a Hash Table uses to create its indexes?

A Hash Function.

How well did you know this?

Not at all

Perfectly

Define a Hash Function.

A Hash Function will take all of the keys for a given map and strategically map them to certain index locations in an array so that they can eventually be retrieved easily.

How well did you know this?

Not at all

Perfectly

What is the BigO of finding the index of a key?

O(1). Near instantaneous.

O(n) if there are hash collisions

How well did you know this?

Not at all

Perfectly

What is the backing Data Structure of a HashTable?

An array. The keys map to indexes in an array.

How well did you know this?

Not at all

Perfectly

What happens when our hash function comes up with the same index value for two separate keys? For example both the key “Steven” and the key “Steve” both compute to the index 9.

A “Hash Collision” occurs.

How well did you know this?

Not at all

Perfectly

Why do we use a hash function to come up with a numerical index?

To allow the use of an Array to store our data.
To minimise the size of the array. Our hash function squashes our keys down to a minimised index range so that we can have arrays sized relatively small compared to the range of our key values. E.g. key is 9 Billion, we could hash it to within a range of 1,000.

How well did you know this?

Not at all

Perfectly

What are the possible ways to handle a “Hash Collision”?

‘Collision Resolution Policy’

Open Addressing
Closed Addressing “Chaining”
Robin Hood Hashing

How well did you know this?

Not at all

Perfectly

What is Open Addressing?

Put the key in some other index location separate from the one returned to us by the hash function. Usually the next OPEN (or free/available/nil) index.

Open addressing uses no extra space - every element is stored in the hash table. If it gets overfull, we can reallocate space and rehash.

When a collision occurs in open addressing, we can…
• probe nearby for a free position (leads to clustering, quadratic probing).
• go to a “random” position by using a second-level hash function.
• try a position given by a second, third, . . . hash function (this plus LCFS gives cuckoo hashing).

How well did you know this?

Not at all

Perfectly

What are the downsides of Open Addressing?

Makes it harder to interact with the data in the map later on.
Leads to issues when another key hashes to the (previously nil) used up index.

How well did you know this?

Not at all

Perfectly

What is Closed Addressing? (Chaining)

Uses Linked Lists to chain together keys which hash to the same index value.

Chaining uses an “overflow” list for each element in the hash table
• Elements that hash to the same slot are placed in a list. The slot contains a pointer to the head of this list.
• Insertion can then be done in constant time.
• A drawback is the additional space overhead.
• The distribution of sizes of lists turns out to be very uneven.

How well did you know this?

Not at all

Perfectly

What are the downsides of Closed Addressing?

To get to the key we want we have to traverse the LinkedList stored in the ‘Collision Index’. This slows down the access time from O(1) to O(n). n being the size of LinkedList, i.e. the number of collisions for that particular index.

How well did you know this?

Not at all

Perfectly

Why is it difficult to do BigO calculations on HashTables?

Because we always assume the worst-case scenario. For a Hash Table this means every single key hashes to the same index value, which means (using Closed Addressing) we essentially end up with a single Linked List which contains all our elements. In this scenario all BigO calculations are that of the Linked List… O(n).

How well did you know this?

Not at all

Perfectly

Why do we want a reliable Hash Function?

To minimise Hash Collisions.

How well did you know this?

Not at all

Perfectly

Knowing that the worst-case scenario for Hash Tables is terrible, what are the BigO results for the average-case scenario for a Hash Table?

They are all O(1)!
To ACCESS, SEARCH for, INSERT, or DELETE a key/value pair from out dictionary, all we need to do. is run that key though the has function and it’ll tell us what index in the hash table to go to in order to perform that operation.

*they can be O(n) in the instance of hash collisions

How well did you know this?

Not at all

Perfectly

What is a hash table?

a data structure which organizes data using hash functions in order to support quick insertion and search.

How well did you know this?

Not at all

Perfectly

What are some desirable properties of a hash function?

It should be computable quickly (constant time).
if keys are drawn uniformly at random, then the hashed values should be uniformly distributed.
keys that are “close” should have their hash values “spread out”.

What is the simple uniform hashing model?

Each of the n keys is equally likely to hash into any of the m slots.
So we are considering a “balls in bins” model
If n is much smaller than m, collisions will be few and most slots will be empty.
If n is much larger than m, collisions will be many and no slots will be empty.
The most interesting behavior is when m and n are of comparable size

What is load factor, λ?

n/m | keys/slots

Provided load factor is kept bounded what is the runtime on average for basic operations?

constant time O(1)

Uniform hashing average insertion cost?

Θ(1/(1 − λ)) as m → ∞

Uniform hashing unsuccessful search cost?

Θ(1/(1 − λ))

Deletion in hash tables

If we use chaining, deletion just involves deleting an element from the relevant linked list. However, we must still find it in order to delete it

open addressing, deletion can be trickier. One method is to mark elements as deleted, but not delete them. However this can cause space wastage

If we simply delete an item, then we may no longer be able to find other items that had previously collided with it

Effect of hashing on runtime of binary search and search trees?

O(1) dictionary operations instead of the O(lg n) needed for binary search (static tables) and/or balanced search trees (dynamic tables)

What happens the larger the load factor gets?

larger λ is the more chance of collisions.

What are the main desired properties for a good hashing scheme?

* The hash locations of the keys S are spread out to minimize collisions (for lookup, insert, deletes,...). * Use a relatively small table size m = O(n) that doesn’t affect performance. * The function h is fast to compute (polynomial in the size of a key x ∈ S, but constant time with respect to n).