A2 Theory W7 Flashcards

Question 1

Q

Briefly describe Linear Probing in a Hash Table giving details about

What is Linear Probing?

Why and when is it used?

What is the basic idea behind linear probing and how is it implemented?

How does it handle a collision?

How does it handle a conflict?

Answer

A

Linear probing is a collision resolution technique used in hash tables designed for efficient key-value storage and retrieval .

What is linear probing?
Linear probing is a method for handling collisions in hash tables. When two or more keys hash to the same index, linear probing aims to resolve the collision by placing the collided key-value pair in the next avaialble slot in the table, moving sequentially through the table until an empty slot is found.

Why and when is it used?
Linear probing is used when you want a simple and straightforward way to hand collisions in a hash table. It is often chosen for simplicity and low overhead, making it suitable for scenarios where memory is a concern

Basic idea and implementation:
The basic idea behind linear probing is to insert a collided key-value pair into the next available slot in the hash table.
Implementation:
- when inserting a key,value pair, hash the key to determine the intial index in the table
- if the slot at the computed index is empty, insert the key-value pair there
- if the slot is occupied, move linearly to the next slot (index +1), and repeat the process until and empty slot is found
- during retrieval, search for the key by hashing it and checking each successive slot until you wither find the key or determine that it’s not in the table

Handling collions:
when a collision occurs in linear probing, the algorithm simply searches for the next available slot in a linear fashion until it finds an empty slot. This means that collided elements are placed sequentially in the table, which can lead to clustering if many colliosions occur in a small region of the hash tables

Handling conflicts:
In the context of linear probing, conflicts refer to situations where you need to insert or retrieve elements with colliding hash values. The conflict resolution strategy is built into linear probing algorithm, where it sequentially checks and updates slots until it finds an empty one during insertion or a matching key during retrieval.

Question 2

Q

Describe how one of the below operations work for a Hash Table which uses Linear Probing to handle collisions and give examples. What is the best-case and worst-case complexity of the below operations. Explain the reason for the best and worst case. No explanation no marks.

search

add

delete

Answer

A

Search operation:
- operation description: when searching for a key in a hash table that uses linear probing for collision resolution, you hash the key to find the initial index in the table. If the key is found at that index or any subsequent index through linear probing, the search is successful, and you can return the associated value.
Example:
Say you have a hash table with with [‘A’, ‘B’, ‘C’]
- the key you want is C
- you hash the key and find its initial index, which is 2
- you check index 2 and find ‘C’ and the search has succeeded

Best-case: O(1), key is found at the initial hashed index and no linear probing required
Worst-case: O(n) where n is the index in the hash table, in worst case, may need to linear probe through the whole table until you find an empty slot or determine the key is not present

Add operation:
Description: when adding a key, value pair to a hash table with LP, you hash the key to determine the initial index. If that index is empty, you insert the key, value pair there. If the index is occupied, you LP through the table until you find an empty slot to insert the pair.
Example:
-You want to add key, value pair (“E”, 5) into the following hash table [“A”, “B”, “c”]
- Index 1 and 2 are occupied so you linearly probe until find an empty slot at index 4 so you insert (“E”, 5) there

Best-case: O(1) - the key is inserted at the initial hashed index without any collisions
Worst-case: O(n), where n is the index in the hash table - you may need to probe through all the slots in the table until you find an empty slot. This occurs when the table is densely populated, leading to a lot of collisions and sequential probing

Delete operation:
Description: when deleting a key from a hash table with linear probing, you search for the key in the same way as during a search operation. Once you find the key, you mark the slot as ‘deleted’ or ‘empty’ to remove it from the table
Example:
Lets say you want to delete “C” from the following hash table [“A”, “B”, “c”]
- you hash the key “C” and find it’s initial index, which is 2
- you check index 2, find “C” and mark it as deleted

Best-case complexity: O(1) - the key to be deleted is found at the initial hashed index, and no LP is needed
Wosrt-case complexity: O(n) where n is the index in the hash table - you may need to probe through all the slots in the table until you find the key to deleted or determine that its not present

Question 3

Q

Contrast the use of Linear Probing and Separate Chaining in Hash tables. Provide at least 1 advantage and disadvantage for each.

Answer

A

Linear probing:
Advantage:
- memory efficiency: LP tends to be more memory efficient compared to seperate chaining. This is because it stores key, value pairs directly in the array without the need for additional data structures like linked lists or arrays of buckets
- cache-friendly: linear probing exhibits good cache locality. when accessing elements sequentially, it tends to keep elements close together in memory. This can result in better cache performance, especially on modern processors with caching mechanisms

Disadvantage:
- clustering: linear probing is prone to clustering, where consecutive slots in the table become filled with collided items. This can lead to slower performance and increased search items as the table becomes densley populated
- difficulty in deletion: deleting elements in linear probing can be complex and may require additional bookkeeping to mark deleted slots as empty or distinguish them from never-occupied slots. This complexity can increase the overhead of deletion operations.

Seperate chaining:
Advantage:
- handles high collision rates: separate chaining handles high collision rates more gracefully. It allows multiple key, value pairs with the same hash to be stored in a linked list or another data structure within each slot. This means that even in scenarios with frequent collisions, the performance remains relatively stable, and retrieval times dont degrade significantly
- Dynamic resizing: seperate chaining simplifies dynamic resizing of the hash table. When the load factor exceeds a certain threshold, it’s easier to resize the table by creating a new, larger array and rehashing all the elements into the new structure. This can lead to more efficient resizing compared to linear probing, which mat require rehasing and probing of all elements.

Disadvantage:
- memory overhead: seperate chaining can consume more memory than linear probing because it requires additional data structures within each slot to handle collisions. This overhead can be substantial, especially when many collisions occur
- less predictable performance: while seperate chaining handles collisions well, the performance can be less predictable in scenarios where collision well, the performance can be less predictable in scenarios where collision chains vary significantly in length. A few chains may be very long, leading to slower lookups in those specific cases, even if most chains are short. This can result in less consistent performance in practice.

Question 4

Q

Distinguish between a collision and a conflict in hashing. How are they handled in a hash table implemented with the below collision handling techniques. Provide an example for each.

Linear Probing

Separate Chaining

Answer

A

A collision in hashing occurs when a key is hashed to the same position as a previous key. A conflict is when a key is hashed to a position but another key is already occupying the position, either due to being hashed to that position or it was inserted into that position. There can only be one conflict per insert, probing for a position after the initial conflict does not create more conflicts.

Linear probing:
- Collision handling: In LP when a collision occurs, the algorithm tries to place the collided key, value in the next available slot by sequentially probing through the table. It continues to probe until an empty slot is found.
- Conflict handling: In LP conflicts are handled by searching for the key sequentially in the table, following the same probing sequence used during insertion. If the key is found, you can now perform the desired operation.

Seperate chaining:
- handling collisions: In SC, each slot of the hash table contains a data structure, typically a linked list or an array, that can hold multiple key, value pairs. When a collision occurs, a new key, value pair is simply added to the data structure at the corresponding slot wihtout overwriting existing entries.
- handling conflicts: conflicts are inherent in SC, as multiple key,value pairs can coexist in the same slot’s data structure. To handle conflicts, you must search within the oinked list or array at the slot for the specific key, value pair you are looking for.

Example for collision in SC:
Consider a hash table with 7 slots, and you want to insert two key, value pairs:
1, Key “A”, Value: 30, hashed to index 2
2, Key “B”, value: 45, hashes to index 2, collides with “A”
In separate chaining, both A and B can be stored in the same slots linked list at index 2

Example of conflict in SC:
If you want to search for “B” in the hash table, you would go to index 2, traverse the linked list, and find “B” in the list, resolving the conflict

Question 5

Q

Define the concept of a load factor in a hash table and explain how it is calculated. How does the load factor affects the performance of below collision handling techniques. Give an example of how adjusting the load factor can improve the performance for each.

Linear Probing.

Separate Chaining

Answer

A

The load factor in a hash table is a measure of how full the table is or the ratio of the number of stored elements (key, value) pairs to the total number of slots in the hash table. Commonly denoted by the symbol alpha and is calculated:

Load factor (alpha) = Number of stored elements/Total number of slots

The load factor is used to assess the efficiency and performance of a hash table. It influences how likely collisions are to occur and impact the tables overall performance.

Effect of load factor on LP:
- higher load factor: when the load factore is high, meaning the hash table is denesly populated, LP is more liekly to encounter collisions, leading to increased clustering and a higher probability of having to probe through many slots to find an empty one. Can result in slower insertion times and search times.
- lower load factor: when the load factor is low, LP is less likely to encounter collisions and there will be more empty slots in the table. This leads to faster insertion and search operations because LP has a higher chance of immediately finding an empty slot.
Example for LP: lets consider a hash table with 10 slots and 8 (key, value) pairs. If all 8 pairs are inserted the load factor will be 8/10 = 0.8. In this case, the high load factor may lead to clustering and slower performance. If you increase the number of slots to 20 while keep the same amount of pairs, the load factor becomes 8/20 = 0.4. This load factor can improve performance and reduce clustering.

Effect of load factor on SC:
Consider a hash table with 10 slots and 20 (key, value) pairs. In this case the load factor would be 10/20 - 2.0 meaning that on average each slot holds two key, value pairs. This high load fator might lead to longer linked lists and slower search operations. To improve performance, you can increas the number of slots to 30 and keep the same amount of pairs, the new load factor will be 20/30 which is approx 0.67 resulting in shorted linked lists and potenitally faster search times.

Question 6

Q

Describe the problem of clustering in a hash table that uses the below collision handling techniques. Does it pertain to both Linear Probing and Separate Chaining collision resolution mechanisms? Explain how it can impact the performance of a hash table.

Answer

A

Clustering is a common problem that pertains to both Linear Probing.

LP clustering:
- in LP, clustering occurs when consecutive slots in the hash table become filled with collided pairs. When one collision happens, LP places the collided pair in the available slot and can continue to do this.
- overtime this creates a cluster of pairs around certain positions, these clusters act as barriers for future insertions as any new collision will have to probe through the cluster to find an empty slot.
- as a result, LP clustering can significantly impact performance causing longer insertion and search times as more slots need to be checked for an available slot in the clustered region.

Question 7

Q