A2 Theory W6 Flashcards

1
Q

Briefly describe Dictionary ADT giving details about

Main property of a Dictionary ADT

Key operations of the Dictionary ADT

Name a few applications where dictionaries are of use.

A

The main properties of a dictionary ADT is that it stores a collection of key-value pairs, where each key is associated with a unique value. The keys are used to look up and retrieve the corresponding values. Dictionaries provide efficient retrieval and storage of data by using a hashing mechanism.

Key operations include:
1, insertion(add): add a new key, value pair to the dictionary
2, deletion(remove): remove a key-value pair form the dictionary based on the key
3, lookup(get): retrieve the value associated with a specific key
4, update(set): change the value associated with a specific key
5, size(count): determine the number of key-value pairs in the dictionary

Applications where dictionaries are of use:
- data cahcing: dicitonaries are used for caching frequently accessed data, such as web page content , to reduce the need for repeated requests to a server.
- symbol tables in compilers: dicitonaries are employed to store variables, functions, and other identifiers during the compilation process
- databases: in database systems, dictionaries are used to store metadata about tables, columns, and their data types
natural language processing: dictionaries are utilised for storing words, their meanings, and other linguistic information in language processing tasks
- address books: personal information management systems often use dictionaries to store contact information with names(keys) and contact(values) details

Dictionaries provide and efficient and felxible way to organise and retrieve data, making them fundamental data structures in computer science and various applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Briefly describe a Hash Function by elaborating on the below points

What is its purpose?

Properties of a Hash Function

How to define a Hash Function

Give an example of a Hash Function.

A

A hash function is a mathematical function that takes an input and returns a fixed-size string of characters, which is typically a numerical representation of the input. The primary purpose of a hash function is to map data of arbitrary size to fixed-size values is a way that is efficient and deterministic. Hash functions are commonly used in computer science for tasks like data retrieval, data storage and data varificiation.

Properties of a hash function:
1, Determinstic: for the same input, a hash function should always produce the same hash value
2, efficient: hash functions return hash values quickly
3, fixed output size: hash functions should compute hash values of a fixed size, regardless of the input size
4, avalance effect: a small change in the input should result in a significantly different hash value
5, pre-image resistance: it should be computationally infeasible to reverse the hash function and determine the original input from the hash value
6, collision resistance: it should be unlikely for two different inputs to produce the same hash value (collison)

How to define a hash function:
- a hash fucntion can be defined using a mathematical formula or algortihm that takes an input and produces a hash value. the specifics of defining a hash function dpeend on the desired properties and the application. Common hash functions include those for hash table, cryptographic applications and checksums.

Give an example of a hash function:
- a simple hash function for strings could be one that calculates the sum of the ASCII values of all characters in the input string and returns the remainder when divided by a prime number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give an example of a good/bad hash function targeting string-typed keys. Explain what it is good/bad.

A

Good hash function example:
- hash function iterates through each character in the input string, multiplies the current hash value by a prime number (usually 31), and adds the ASCII value of each character. finally, it takes the remainder when divided by 101.
- the choice of a prime number for both multiplication and modulo helps achieve better distribution of hash values.
- it exhibits properties of a good hash function, such as determinism, efficiency, fixed out-out size, and an avalanche effect
- it minimizes collisions by using modulo with a prime number and a multiplier, which helps spread the keys more evenly across the available hash buckets

Bad hash function example:

  • hash function simply returns the length of the string of the input string as the hash value
  • it is a poor choice for a hash function because it doesn’t efficiently distribute keys across hash buckets. Many strings of different content will have the same length, leading to a high likelihood of collisions
  • it does not exhibit the properties desireable in a hash function, such as the avalanche effect and collision resistance
  • collisions would be frequent, making it unsuitable for hash tables, and other data structures that rely on an even distribution of keys

in summary, a good hash function should produce hash values that are well distributed, minimising collisions and ensuring efficient data retrieval. A bad hash function, on the other hand, may produce hash values that a re poorly distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a collision? How can it be resolved? Name 2 collision handling techniques studied in the unit. Provide one line describing them.

A

A collision in hasing occurs when two keys produce the same hash value when processed by a hash function. Multiple keys are mapped to the same location in a hash table. Collisions lead to data retrieval issues and reduced performance in hash based data structures like hash tables.

Collision handlind techniques:
seperate chaining:
- in seperate chaining, each location in the hash table is associated with a linked list or another data structure that can store multiple key-value pairs
- when a collision occurs, the new key-value pair is inserted into the appropriate locations linked list
- during retrieval, the hash table first computes the hash value for the key and then searches the linked list in the corresponding bucket for the desired key
- seperate chaining is easy to implement and allows for an arbitrary number of collisions to be handled effectively

Open addressing/linear probing:
- in linear probing, when a collision occurs, the algorithm searches for the next available slot in the hash table by linearly probing through the table
- the probing sequence is determined by a probing function which is typically of the form ‘hash(key) + i’ (where i is the number of probes)
- when inserting a key-value pair, the algorithm searches for the next available slot and inserts the pair there
- during retrieval, the algorithm follows the same probing sequence to locate the desired key
- linear probing can result in fewer memory overhead compared to seperate chaining but requires careful handling of deletions and probing sequences

Both techniques are used to handle collisions in hash based data structures and the choice between them depends on factors such as the expected number of collisions, memory usage constraints and the specific requirements of the application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Briefly describe Separate Chaining giving details about

What is Separate Chaining

Why is it used?

How is it implemented?

What data structures can be used to implement separate chaining?

A

Seperate chaining is a collision resolution technique used in hash-based data structures such as hash tables. It involves associating each bucket or slot in the hash table with a seperate data structure, often a linked list or an array. When collision occurs, new key-value pairs are stored in this associated data structure rather than overwriting the existing value.

Sepearate chaining is used to handle collisions efficiently in hash tables. It helps prevent data loss and ensures that all key-value pairs are stored and retrievable, even when multiple leys map to the same hash location. This techniqiue allows hash tables to accomodate a varibale number of collisions gracefully.

How is it implemented?
Typically you follow these steps for implementation:
- create an array, the main hash table, with a fixed number of locations.
- choose a hash function that maps keys to indexes in the array
- when inserting a key-value pair, compute the hash value of the key to determine the target location.
- if the location is empty, insert the pair there
- if the location is not empty, sotre the new pair in a data structure associated with the bucket, such as a linked list or an array
- during retrieval, compute the hash value of the key to locate the target location and then search the associated data structure for the key.

Data structures that can be used:
1, linked lists: each location is associated with a linked list. Collisions are resolved by appending key-value pairs to the linked list of the corresponding location.
2, dynamic arrays: similar to livked lists, but dynamic arrays are used instead of linked lists for storing the pairs associated with the location
3, binary search trees: each location can be associated with a binary search tree to maintain efficient search and insertion while handling collisions
4, Nested hash tables: in some implementations, another hash table can be used as the associated data structure for each locations, allowing for efficient lookups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe how each of the below operations work for a Hash Table which uses separate chaining to handle collisions and give examples. What is the best-case and worst-case complexity of one of the below operations? Explain the reason for the best and worst case. No explanation no marks.

search

add

delete

A

Search operation:

How it works:
- computer the hash value of the search key to determine the target location
- search the linked list associated with the target bucket for the key
- if the key is found, return the associated value. otherwise, indicate that the key is not in the hash table
Example:
- suppose you have a hash table with seperate chaining and you want to search for the key ‘John’
- compute the hash value for ‘John’ and look in the linked list associated with the bucket 5
- traverse the linked list and if you find ‘John’ return the associated value, otherwise, indicate that ‘John’ is not in the hash table

Add operation:

How it works:
- compute the hash value of the key to determine the target location
- insert the key-value pair into the linked list associate with the target bucket
- if the key already exists in the linked list, update it’s value. otherwise, append the new key-value pair to the list
Example:
- you want to add the key-value pair “alice”“555” to the hash table
- compute the hash value for ‘Alice’ and insert the key-value pair into the linked list associated with that value
- if ‘Alice’ already exists in the list, update the value to ‘555’. otherwise append the pair to the list

Delete operation:

How it works:
- compute the hash value of the key to determine the target location
- search the linked list associated with the target location for the key
- if the key is found in the list, remove it from the list
- if the key is not found, no action is needed
Example:
- you want to delete the key ‘Bob’ from the hash table
- compute the hash value for ‘Bob’, and search the linked list associated with that location
- if ‘Bob’ is found in the list, remove it
- if Bob is not found, no action required

Complexity:
Best case complexity(search, add, delete):
- O(1) when there are no collisions
- the best case occurs when the hash function distributes the keys evenly across locations and there are no collisions
Worst case complexity(searcg, add, delete):
- O(n) is the worst case, where n is the number of keys
- the worst case occurs when all keys collide and are placed in the same location
- in such cases, searching, deleting or adding a key may require traversing the entire linked list associated with the location, resulting in linear time complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly