Algorithm design, sorting and hashing Flashcards

1
Q

Tehcniques

A

Good algorithm designers understand several fundamental al-
gorithm design techniques, including data structures, dynamic programming,

depth-first search, backtracking, and heuristics. Perhaps the single most im-
portant design technique is modeling, the art of abstracting a messy real-world

application into a clean problem suitable for algorithmic attack.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Resources

A

Good algorithm designers stand on the shoulders of giants.
Rather than laboring from scratch to produce a new algorithm for every task,
they can figure out what is known about a particular problem. Rather than

re-implementing popular algorithms from scratch, they seek existing imple-
mentations to serve as a starting point. They are familiar with many classic

algorithmic problems, which provide sufficient source material to model most
any application.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Characteristics of sorting

A

Time Complexity: Time complexity, a measure of how long it takes to run an algorithm, is used to categorize sorting algorithms. The worst-case, average-case, and best-case performance of a sorting algorithm can be used to quantify the time complexity of the process.
Space Complexity: Sorting algorithms also have space complexity, which is the amount of memory required to execute the algorithm.
Stability: A sorting algorithm is said to be stable if the relative order of equal elements is preserved after sorting. This is important in certain applications where the original order of equal elements must be maintained.
In-Place Sorting: An in-place sorting algorithm is one that does not require additional memory to sort the data. This is important when the available memory is limited or when the data cannot be moved.
Adaptivity: An adaptive sorting algorithm is one that takes advantage of pre-existing order in the data to improve performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sorting

A

defined as the process of arranging a collection of data elements in a specific order, usually in ascending or descending order based on a specific attribute of the data elements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Characteristics of sorting

A

Time Complexity: Time complexity, a measure of how long it takes to run an algorithm, is used to categorize sorting algorithms. The worst-case, average-case, and best-case performance of a sorting algorithm can be used to quantify the time complexity of the process.

Space Complexity: Sorting algorithms also have space complexity, which is the amount of memory required to execute the algorithm.

Stability: A sorting algorithm is said to be stable if the relative order of equal elements is preserved after sorting. This is important in certain applications where the original order of equal elements must be maintained.

In-Place Sorting: An in-place sorting algorithm is one that does not require additional memory to sort the data. This is important when the available memory is limited or when the data cannot be moved.

Adaptivity: An adaptive sorting algorithm is one that takes advantage of pre-existing order in the data to improve performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Based on how they manage equal elements throughout the sorting process, sorting algorithms can be broadly categorized into two types:

Stable sorting algorithm
Unstable sorting algorithm.

A

The relative order of equal elements is preserved by stable sorting algorithms but is not ensured by unstable sorting algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Applications of sorting

A

Searching and retrieval of data

Data analysis and visualization

Database management and optimization

Sorting and organizing files and directories

Image and signal processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Advantages of sorting

A

Improved search efficiency: It helps to organize data in a way that improves search efficiency.

Better data analysis: This algorithm can not only help in data analysis by identifying patterns, outliers, and trends but also help to summarize large datasets by grouping similar items together.

Improved database performance – This can improve the performance of databases by reducing the amount of time required to perform searches and data retrieval.

Easier data management – It can also make it easier to manage data by organizing it in a way that is easy to understand and work with.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Disadvantages of Sorting in DSA

A

Can be computationally expensive, especially for large datasets.
This may require additional memory to perform the sorting operation.
Maintaining the sort order of data can add complexity to data updates and insertions, which can be a disadvantage when dealing with dynamic datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bubble Sort algorithm

A

In this algorithm,

traverse from left and compare adjacent elements and the higher one is placed at right side.

In this way, the largest element is moved to the rightmost end at first.

This process is then continued to find the second largest and place it and so on until the data is sorted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Selection Sort

A

The algorithm repeatedly selects the smallest (or largest) element from the unsorted portion of the list and swaps it with the first element of the unsorted part. This process is repeated for the remaining unsorted portion until the entire list is sorted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantages of selection sort algorithm

A

Simple and easy to understand.
Works well with small datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Disadvantages of selection sort algorithm

A

Selection sort has a time complexity of O(n^2) in the worst and average case.

Does not work well on large datasets.

Does not preserve the relative order of items with equal keys which means it is not stable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Time Complexity Analysis of Selection Sort

A

The time complexity of Selection Sort is O(N2) as there are two nested loops:

One loop to select an element of Array one by one = O(N)
Another loop to compare that element with every other Array element = O(N)
Therefore overall complexity = O(N) * O(N) = O(N*N) = O(N2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Auxiliary Space

A

O(1) as the only extra memory used is for temporary variables while swapping two values in Array. The selection sort never makes more than O(N) swaps and can be useful when memory writing is costly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Insertion sort

A

simple sorting algorithm that works similar to the way you sort playing cards in your hands. The array is virtually split into a sorted and an unsorted part. Values from the unsorted part are picked and placed at the correct position in the sorted part.
To sort an array of size N in ascending order iterate over the array and compare the current element (key) to its predecessor, if the key element is smaller than its predecessor, compare it to the elements before. Move the greater elements one position up to make space for the swapped element.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Time complexity of insertion sort

A

The worst-case time complexity of the Insertion sort is O(N^2)

The average case time complexity of the Insertion sort is O(N^2)

The time complexity of the best case is O(N).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The auxiliary space complexity of Insertion Sort is ____

A

O(1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Characteristics of insertion sort

A

This algorithm is one of the simplest algorithms with a simple implementation

Basically, Insertion sort is efficient for small data values

Insertion sort is adaptive in nature, i.e. it is appropriate for data sets that are already partially sorted.

20
Q

merge sort

A

a sorting algorithm that works by dividing an array into smaller subarrays, sorting each subarray, and then merging the sorted subarrays back together to form the final sorted array.

21
Q

How does merge sort work?

A

Merge sort is a recursive algorithm that continuously splits the array in half until it cannot be further divided i.e., the array has only one element left (an array with one element is always sorted). Then the sorted subarrays are merged into one sorted array.

22
Q

Complexity analysis of merge sort

A

Time Complexity: O(N log(N)), Merge Sort is a recursive algorithm and time complexity can be expressed as following recurrence relation.

T(n) = 2T(n/2) + θ(n)

The above recurrence can be solved either using the Recurrence Tree method or the Master method. It falls in case II of the Master Method and the solution of the recurrence is θ(Nlog(N)). The time complexity of Merge Sort isθ(Nlog(N)) in all 3 cases (worst, average, and best) as merge sort always divides the array into two halves and takes linear time to merge two halves.

Auxiliary Space: O(N), In merge sort all elements are copied into an auxiliary array. So N auxiliary space is required for merge sort.

23
Q

Applications of merge sort

A

Sorting large datasets: Merge sort is particularly well-suited for sorting large datasets due to its guaranteed worst-case time complexity of O(n log n).

External sorting: Merge sort is commonly used in external sorting, where the data to be sorted is too large to fit into memory.
Custom sorting: Merge sort can be adapted to handle different input distributions, such as partially sorted, nearly sorted, or completely unsorted data.
Inversion Count Problem

24
Q

Advantages of merge sort

A

Stability: Merge sort is a stable sorting algorithm, which means it maintains the relative order of equal elements in the input array.

Guaranteed worst-case performance: Merge sort has a worst-case time complexity of O(N logN), which means it performs well even on large datasets.

Parallelizable: Merge sort is a naturally parallelizable algorithm, which means it can be easily parallelized to take advantage of multiple processors or threads.

25
Q

Disadvantages of merge sort

A

Space complexity: Merge sort requires additional memory to store the merged sub-arrays during the sorting process.

Not in-place: Merge sort is not an in-place sorting algorithm, which means it requires additional memory to store the sorted data. This can be a disadvantage in applications where memory usage is a concern.

Not always optimal for small datasets: For small datasets, Merge sort has a higher time complexity than some other sorting algorithms, such as insertion sort. This can result in slower performance for very small datasets.

26
Q

QuickSort

A

sorting algorithm based on the Divide and Conquer algorithm that picks an element as a pivot and partitions the given array around the picked pivot by placing the pivot in its correct position in the sorted array.

27
Q

Hashing

A

refers to the process of generating a fixed-size output from an input of variable size using the mathematical formulas known as hash functions. This technique determines an index or location for the storage of an item in a data structure.

28
Q

The amount of data on the internet is growing exponentially every day, making it difficult to store it all effectively. In day-to-day programming, this amount of data might not be that big, but still, it needs to be stored, accessed, and processed easily and efficiently. A very common data structure that is used for such a purpose is the Array data structure.

A

Now the question arises if Array was already there, what was the need for a new data structure! The answer to this is in the word “efficiency“. Though storing in Array takes O(1) time, searching in it takes at least O(log n) time. This time appears to be small, but for a large data set, it can cause a lot of problems and this, in turn, makes the Array data structure inefficient.

So now we are looking for a data structure that can store the data and search in it in constant time, i.e. in O(1) time. This is how Hashing data structure came into play. With the introduction of the Hash data structure, it is now possible to easily store data in constant time and retrieve them in constant time as well.

29
Q

Three components of hashing

A

Key: A Key can be anything string or integer which is fed as input in the hash function the technique that determines an index or location for storage of an item in a data structure.

Hash Function: The hash function receives the input key and returns the index of an element in an array called a hash table. The index is known as the hash index.

Hash Table: Hash table is a data structure that maps keys to values using a special function called a hash function. Hash stores the data in an associative manner in an array where each data value has its own unique index.

30
Q

Advantages of hashing in data structures

A

Key-value support: Hashing is ideal for implementing key-value data structures.

Fast data retrieval: Hashing allows for quick access to elements with constant-time complexity.

Efficiency: Insertion, deletion, and searching operations are highly efficient.

Memory usage reduction: Hashing requires less memory as it allocates a fixed space for storing elements.

Scalability: Hashing performs well with large data sets, maintaining constant access time.

Security and encryption: Hashing is essential for secure data storage and integrity verification.

31
Q

Computation thinking

A

Using four fundamental patterns, computational thinking is a method for solving any problem. If we effectively comprehend and use the four fundamental patterns, computational thinking for programming becomes simple. The first step in effectively understanding an issue is to break it down into smaller components. We can more effectively use other computational thinking components when we divide the problems into smaller, more manageable pieces. The second component in this process is pattern recognition; the problems are reviewed to see if there is any sequence. If there are any patterns, they have been categorized appropriately. If no patterns are found, further simplification of that issue is not necessary. An abstraction or generalization of the issue serves as the third component. When you stand back from the specifics of a problem, you can develop a more general answer that can be useful in a number of different ways. The Algorithm, the fourth and final component, is where problems are incrementally addressed. Making a plan for your solution is crucial. A method for figuring out step-by-step directions on how to tackle any problem is to use an algorithm.

32
Q

The purpose of collision resolution during insertion is to locate an open location in the hash table when the record’s home position is already taken. Any collision resolution technique may be thought of as creating a series of hash table slots that may or may not contain the record.

A

The key will be in its home position in the first position in the sequence. The collision resolution policy shifts to the following location in the sequence if the home position is already occupied. Another slot needs to be sought if this is also taken, and so on. The probe sequence is a collection of slots that is produced by a probe function that we will refer to as p. This is how insertion operates.

33
Q

Collision in hashing

A

In this, the hash function is used to find the index of the array. The hash value is used to create an index for the key in the hash table. The hash function may return the same hash value for two or more keys. When two or more keys have the same hash value, a collision happens. To handle this collision, we use collision resolution techniques.

34
Q

Collision resolution technique:
Separate chaining (open hashing)

A

This method involves making a linked list out of the slot where the collision happened, then adding the new key to the list. Separate chaining is the term used to describe how this connected list of slots resembles a chain. It is more frequently utilized when we are unsure of the number of keys to add or remove.

Time complexity

Its worst-case complexity for searching is o(n).
Its worst-case complexity for deletion is o(n).
Advantages of separate chaining

It is easy to implement.
The hash table never fills full, so we can add more elements to the chain.
It is less sensitive to the function of the hashing.
Disadvantages of separate chaining

In this, the cache performance of chaining is not good.
Memory wastage is too much in this method.
It requires more space for element links.

35
Q

Collision resolution technique:
Open addressing (closed hashing)

A

To prevent collisions in the hashing table open, addressing is employed as a collision-resolution technique. No key is kept anywhere else besides the hash table. As a result, the hash table’s size is never equal to or less than the number of keys. Additionally known as closed hashing.

The following techniques are used in open addressing:

Linear probing
Quadratic probing
Double hashing

36
Q

Linear probing

A

This involves doing a linear probe for the following slot when a collision occurs and continuing to do so until an empty slot is discovered.
The worst time to search for an element in linear probing is O. The cache performs best with linear probing, but clustering is a concern. This method’s key benefit is that it is simple to calculate.

Disadvantages of linear probing:

The main problem is clustering.
It takes too much time to find an empty slot.

37
Q

Quadratic probing

A

When a collision happens in this, we probe for the i2-nd slot in the ith iteration, continuing to do so until an empty slot is discovered. In comparison to linear probing, quadratic probing has a worse cache performance. Additionally, clustering is less of a concern with quadratic probing.

38
Q

Double hashing

A

In this, you employ a different hashing algorithm, and in the ith iteration, you look for (i * hash 2(x)). The determination of two hash functions requires more time. Although there is no clustering issue, the performance of the cache is relatively poor when using double probing.

39
Q

hash data structures

A

a fundamental building block of computer science and are used in a wide range of applications such as databases, caches, and programming languages. They are a way to map data of any type, called keys, to a specific location in memory called a bucket. These data structures are incredibly fast and efficient, making them a great choice for large and complex data sets.

Whether you’re building a database, a cache, or a programming language, Hash data structures are like a superpower for your data. They allow you to perform basic operations like insertions, deletions, and lookups in the blink of an eye, and they’re the reason why your favorite apps and websites run so smoothly.

40
Q

A hash data structure is a type of data structure that allows for efficient insertion, deletion, and retrieval of elements. It is often used to implement associative arrays or mappings, which are data structures that allow you to store a collection of key-value pairs.

A

In a hash data structure, elements are stored in an array, and each element is associated with a unique key. To store an element in a hash, a hash function is applied to the key to generate an index into the array where the element will be stored. The hash function should be designed such that it distributes the elements evenly across the array, minimizing collisions where multiple elements are assigned to the same index.

41
Q

When retrieving an element from a hash, the hash function is again applied to the key to find the index where the element is stored. If there are no collisions, the element can be retrieved in constant time, O(1). However, if there are collisions, multiple elements may be assigned to the same index, and a search must be performed to find the correct element.

A

To handle collisions, there are several strategies that can be used, such as chaining, where each index in the array stores a linked list of elements that have collided, or open addressing, where collisions are resolved by searching for the next available index in the array.

Hash data structures have many applications in computer science, including implementing symbol tables, caches, and databases. They are especially useful in situations where fast retrieval of elements is important, and where the number of elements to be stored may be large.

42
Q

collision resolution

A

Collision resolution in hash can be done by two methods:

Open addressing and
Closed addressing.

43
Q

Open Addressing

A

Open addressing collision resolution technique involves generating a location for storing or searching the data called probe. It can be done in the following ways:

Linear Probing: If there is a collision at i then we use the hash function – H(k, i ) = [H’(k) + i ] % m
where, i is the index, m is the size of hash table H( k, i ) and H’( k ) are hash functions.

Quadratic Probing: If there is a collision at i then we use the hash function – H(k, i ) = [H’(k) + c1 * i + c2 * i2 ] % m
where, i is the index, m is the size of hash table H(k, i ) and H’( k ) are hash functions, c1 and c2 are constants.

Double Hashing: If there is a collision at i then we use the hash function – H(k, i ) = [H1(k, i) + i * H2(k) ] % m
where, i is the index, m is the size of hash table H(k, i ), H1( k) = k % m and H2(k) = k % m’ are hash functions.

44
Q

Closed addressing

A

Closed addressing collision resolution technique involves chaining. Chaining in the hashing involves both array and linked list. In this method, we generate a probe with the help of the hash function and link the keys to the respective index one after the other in the same index. Hence, resolving the collision.

45
Q

Applications of hash

A

Hash is used in databases for indexing.

Hash is used in disk based data structures.

In some programming languages like Python, JavaScript hash is used to implement objects.

Hash tables are commonly used to implement caching systems

Used in various cryptographic algorithms.

Hash tables are used to implement various data structures.

Hash tables are used in load balancing algorithms

Databases: Hashes are commonly used in databases to store and retrieve records quickly. For example, a database might use a hash to index records by a unique identifier such as a social security number or customer ID.

Caches: Hashes are used in caches to quickly look up frequently accessed data. A cache might use a hash to store recently accessed data, with the keys being the data itself and the values being the time it was accessed or other metadata.

Symbol tables: Hashes are used in symbol tables to store key-value pairs representing identifiers and their corresponding attributes. For example, a compiler might use a hash to store the names of variables and their types.

Cryptography: Hashes are used in cryptography to create digital signatures, verify the integrity of data, and store passwords securely. Hash functions are designed such that it is difficult to reconstruct the original data from the hash, making them useful for verifying the authenticity of data.

Distributed systems: Hashes are used in distributed systems to assign work to different nodes or servers. For example, a load balancer might use a hash to distribute incoming requests to different servers based on the request URL or other criteria.

File systems: Hashes are used in file systems to quickly locate files or data blocks. For example, a file system might use a hash to store the locations of files on a disk, with the keys being the file names and the values being the disk locations.

46
Q

Application of Hash::

A

Hash provides better synchronization than other data structures.

Hash tables are more efficient than search trees or other data structures.

Hash provides constant time for searching, insertion and deletion operations on average.

Hash tables are space-efficient.

Most Hash table implementation can automatically resize itself.
Hash tables are easy to use.

Hash tables offer a high-speed data retrieval and manipulation.

Fast lookup: Hashes provide fast lookup times for elements, often in constant time O(1), because they use a hash function to map keys to array indices. This makes them ideal for applications that require quick access to data.

Efficient insertion and deletion: Hashes are efficient at inserting and deleting elements because they only need to update one array index for each operation. In contrast, linked lists or arrays require shifting elements around when inserting or deleting elements.

Space efficiency: Hashes use space efficiently because they only store the key-value pairs and the array to hold them. This can be more efficient than other data structures such as trees, which require additional memory to store pointers.

Flexibility: Hashes can be used to store any type of data, including strings, numbers, and objects. They can also be used for a wide variety of applications, from simple lookups to complex data structures such as databases and caches.

Collision resolution: Hashes have built-in collision resolution mechanisms to handle cases where two or more keys map to the same array index. This ensures that all elements are stored and retrieved correctly.

47
Q

Disadvantages of Hash

A

Hash is inefficient when there are many collisions.

Hash collisions are practically not be avoided for large set of possible keys.

Hash does not allow null values.

Hash tables have a limited capacity and will eventually fill up.

Hash tables can be complex to implement.

Hash tables do not maintain the order of elements, which makes it difficult to retrieve elements in a specific order.