LN13 Space-efficient sequence alignment, Union-and-Find Flashcards
What is the space-efficient method for sequence alignment in dynamic programming?
By storing only the current and previous rows of the DP matrix, we reduce space complexity to O(m) for scores, but we use divide-and-conquer with midpoint identification to trace the optimal path efficiently.
How does divide-and-conquer help in space-efficient sequence alignment?
By finding the middle point on the optimal path and recursively solving subproblems, we reduce the memory requirement while retaining the optimal solution structure.
What is the time complexity of the space-efficient sequence alignment algorithm?
The time complexity is O(nm) due to the recursive structure and midpoint computation, but it achieves a space complexity of O(n) due to the divide-and-conquer approach.
Define clustering with maximum spacing.
It is a clustering problem where points are partitioned into k clusters such that the minimum distance between points in different clusters (spacing) is maximized.
How can the clustering with maximum spacing problem be solved using Kruskal’s MST algorithm?
Run Kruskal’s algorithm until n - k edges are included. The resulting clusters (disconnected components) maximize the spacing between any two clusters.
Why does Kruskal’s MST algorithm yield optimal spacing for clustering?
Any partition into k clusters must have a minimum inter-cluster edge cost greater than or equal to the largest edge removed, ensuring maximum spacing.
What is the significance of spacing in clustering problems?
Spacing measures the minimum distance between any two points in different clusters, ensuring clusters are well-separated and increasing clarity in classification tasks.
How does the midpoint approach in sequence alignment work to find the optimal alignment?
By calculating the score at the midpoint of one sequence and summing scores from both ends, the algorithm identifies the best cut point, dividing the problem into smaller, solvable subproblems.
What is the recurrence relation for the space-efficient sequence alignment algorithm?
The recurrence is T(n, m) = 2T(n, m/2) + O(nm), yielding an O(nm log m) bound, simplified to O(nm) in time and O(n) in space after optimization.
What is the role of Kruskal’s algorithm in clustering problems, specifically with maximum spacing?
Kruskal’s MST construction clusters data by stopping after enough edges are selected to form the desired number of clusters, ensuring maximum spacing between them.