Unit 5: Reductions and branch and bound Flashcards
Hashing
Alternative data structure for the dictionary problem.
Observation: Usually only a small subset of all possible keys is stored.
Idea: Instead of searching through a set of data elements using key comparisons, one can calculate the (exact) position of an element in storage (usually an array) by using arithmetic operations.
Hashing: basics
Hash table
- We assume that the hash table
T
has a fixed sizem
and uses an array with indices0, ..., m-1
. - For a hash table of size
m
that currently storesn
keys (elements), we denote α = n/m its load factor.
Load factor
For a hash table of size m
that currently stores n
keys (elements), we denote its load factor:
α = n/m
Hashing: basics
Hash function
We choose a hash function h: K → {0, ..., m-1}
that assigns every key k ∈ K
a unique hash value.
(however, more than one key can be assigned the same hash value)
Advantages of hashing
Runtime for the operations search, insert and remove is constant Θ(1) in the ideal case and assuming that h(s)
can be calculated in constant time.
3 Limitations of hashing
- Whenever
h(k) = h(k')
fork≠k'
, i.e. two distinct keys obtain the same hash value, we say that we have a collision. -
Θ(1)
d only holds if we assume that the number of collisions is negligible. - Worst-case runtime is still
O(n)
2 Characteristics of a “good” hash function
- The aim is a uniform distribution of the used keys to the bucekts
0, ..., m-1
of the hash table. - Small changes to the keys should lead to a distinct and possible independent hash value.
Hash functions
Modulo method
Assumption: k∈N
(keys are natural numbers.
Calculation: h(k) = k mod m
Hash functions
Properties of the Modulo method
Properties:
- the hash function can be calculated very efficiently.
- the right choice of m
is very important. To ensure uniformity, a good choice for m
is usually a prime number.
Modulo method
Bad choices for m
-
m = 2ⁱ
: Only the lasti
bits play a role. -
m = 10ⁱ
: Analogously for decimal numbers. -
m = rⁱ
: Analogously forr
-adic numbers. -
m = rⁱ ± j
for smallj
can be problematic.’’’
’’’ e.g. m = 2⁷ - 1 = 127
. p=112
in ASCII.h("pt")
= (112 · 128 + 116) mod 127 = 14452 mod 127 = 101h("tp")
= (116 · 128 + 112) mod 127 = 14960 mod 127 = 101
(Keys with different ordering of the same letters obtain the same hash value)
Hash functions
Multiplication method
Assume k∈N
.
Let A
be an irrational number.
Calculation:h(k) = ⌊m (k · A - ⌊ k · A ⌋) ⌋
- The key is mutiplied with
A
and only the fractional part is kept. - This provides a value in
[0, 1)
, which is multiplied with the table sizem
and rounded.
Observation that confirms uniformity: For a sequence 1, 2, 3, …, i
of keys it holds that k · A - ⌊ k · A ⌋
of the next key k=i+1
always lies in a largest interval of the values of the previous keys, 0
and 1
.
Multiplication method: choice for A
In general: the choice of m
is not of importance as long as A
is an irrational number.
Best choice for A: The golden ratio
A = ɸ⁻¹ = (√5 - 1) / 2
ɸ⁻¹ = limₙ
Open hashing: idea
All elements are directly placed into the array, every slot in the array is marked with a flag
fᵢ ∈ {free, used, free again}
Handling of collisions:
If a slot is used, consider other slots in a certain predefined order (probing).
Open hashing: probing
- Every slot
i=0, 1, ..., m-1
is marked by a flagfᵢ ∈ {free, used, free again{
. -
fᵢ = free
for everyi
in the initial hash table. - At insertion, the new element will be added at the first slot marked with
free
orfree again
, the flag is then updated toused
. - The search iterates the slots until the key is found or a slot is marked with
free
(in which case the key is not there). - After removal of an element, the flag is set to
free again
.
Linear probing
Given: a normal hash functionh': K → {0, 1, ..., m-1}
Linear probing: For every i = 0, 1, ..., m-1
, we set:h(k, i) = (h'(k) + i) mod m
Linear probing: problems
Problems:
- After insertions, the probability for a new key to be inserted at a certain slot becomes non-uniform.
- After a slot is used, the probability for the insertion of a key at the next slot changes.
- Long consecutive sequences of used slots have a tendency to grow faster than shorter ones.
- This effect amplifies, since long sequences combine more easily to create even longer ones.
- As a consequence of this phenomenon (primary clustering), the efficiency of the hash table deteriorates drastically once the load factor
⍺
gets closer to 1.
Quadratic probing
Idea: To avoid the primary clustering of linear probing, quadratic probing uses a quadratically growing distance to probe a new slot.
Given: A normal hash function:h': K → {0, 1, ..., m-1}
Quadratic probing: probing function:h(k, i) = (h'(k) + c₁i + c₂i²) mod m
Here c₁
and c₂
are carefully chosen constants.
Double hashing
Idea: The efficiency of uniform probing can already be almost achieved if one uses a second hash function instead of a randomized probing order.
Given: 2 hash functions:
h₁, h₂: K → {0, 1, ..., m-1}
Double hashing: We set (for i = 0, 1, ..., m-1
):
h(k, i) = (h₁(k) + ih₂(k)) mod m
Choice of h₂(k)
: The probing order must reach all slots 0, ..., m
for every key k
. This implies that h₂(k) ≠ 0
and that it must not divide
Á la Brent
Idea: If at insertion of a key, a probed slot j
is used with k' = T[j].key
, then set:j₁ = ( j + h₂(k) ) mod m
j₂ = ( j + h₂(k') ) mod m
If j₁
is used but j₂
is free, move k'
to j₂
to make space for k
at j
.
Analysis of Brent’s improvement
Number of probes: On average (for large m
and n
):
- Unsuccessful search ≃ 1 / (1 - ɑ)
- Successful search < 2.5 (independent ɑ for ɑ ≤ 1)
Advantage: Average time for a successful search is constant even in the extreme event that ɑ = 1.
Open Hashing
Suitability and reorganization
- Open hashing methods only allow a load factor of at most 1.
- Load factors close to 1 are in general bad, since the number of probing steps can get very large.
- If necessary, a reorganisation is required, i.e. completely new hash table (e.g. with double the size) is populated if more elements need to be stored than originally anticipated.
- Open hashing methods are usually well-suited whenever the number of elements to be stored is known in advance and elements are rarely or not at all removed.
Tractable problems
We say that a problem is efficiently solvable, if it can be solved in polynomial time.
I.e. for such a problem, there is a O(n^c)
algorithm, where:
-
n
= input size -
c
= constant
Efficiently solvable problems are also referred to as tractable problems.
Cobham-Edmonds Assumption
- The assumption that tractability can be considered the same as solvability in polinomial time.
- Pioneered by Alan Cobham and Jack Edmonds, who proposed this assumption in the 1960s.
- The assumption has become the predominant assumption in computer science over the last 50 years.
Cobham-Edmonds Assumption
Justification
- In practice, polynomial time algorithms usually have small constants and exponents (e.g. linear, quadratic, or cubic)
- Overcoming the exponential bound of brute-force algorithms usually goes hand-in-hand with significant structural and combinatorial insights into the problem.
Cobham-Edmonds Assumption
Exceptions and Criticism
- Some polynomial time algorithms have huge constants and exponents and are considered useless in practice.
- Some exponential time algorithms are considered highly useful in practice since their worst-case pehaviour only occurs rarely or the problem instances are sufficiently small.
- For huge datasets, even linear time algorithms may be too slow.
Decision problem
- A problem whose solution is either YES or NO
- In contrast to functional or optimization problems that return a solution or set of solutions.
- Typically: every functional or optimization problem has a natural corresponding decision problem and vice versa.
Define
Polynomial time reduction
A polynomial time reduction from a problem X
to a problem Y
is an algorithm A
that - for every instance x
of X
computes an instance y
of Y
such that:
-
x
is a YES-instance ofX
if and only ify
is a YES-instance ofY
(equivalence). -
A
runs in polynomial time, i.e. there is a constantc
such that given an instancex
ofX
with sizen
,A
computes the corresponding instancey
ofY
in timeO(n^c)
We write X ≤ pY
if there is a polynomial time reduction from X
→ Y
X ≤ pY
intuition
If X ≤ pY
, “X can be polynomially reduced to Y”, then intuitively
“Y is at least as hard as X”
Showing Intractability
We show that a problem is intractable by reducing to it from some other intractable problem.
Lemma:
If X ≤ pY
and X cannot be solved in polynomial time, then Y cannot be solved in polynomial time.
Reductions
Transitivity
If X ≤ pY
and Y ≤ pZ
,
then X ≤ pZ
Graphs
Independent Set
A set of vertices that are pairwise not adjacent.
Independent Set problem
Input: A graph G
and an integer k
.
Question: Does G
have an independent set of size at least k
?
Graphs
Vertex Cover
A set of vertices that contains at least one endpoint from every edge.
Vertex cover problem
Input: A graph G
and an integer k
.
Question: Does G
have a vertex cover of size at most k
?
Vertex cover vs Independent set
Let X
be a vertex cover of a graph G = (V, E)
.
Then V \ X
is an independent set.
Moreover, a set X
is a vertex cover if and only if V \ X
is an independent set.
Graphs
Non-blocker
A non-blocker of G
is a subset N ⊆ E
of edges such that G
constains a path between any pair of vertices that does not contain an edge in N
.
N
is a non-blocker if graph G
minus all edges in N
is connected
Weight of a non-blocker N
the sum of all the weights of all edges in N
, i.e. Σₑw(e)
Maximum non-blocker
A non-blocker of maximum weight
Non-blocker Problem
Input: A connected graph G=(V,E)
with edge weights w: E → Q
and a number W ∈ Q
.
Question: Does G
have a non-blocker of weight at least W
?
Spanning Tree problem
Input: A connected graph G=(V,E)
with edge weights w: E → Q
and a number W ∈ Q
.
Question: Does G
have a spanning tree T=(V,F)
of weight at most W
?
i.e. ∑w(e) ≤ W
for e ∈ F
Set Cover problem
Input: A set of elements U
, a collection S₁, ..., Sₘ
of subsets of U
and an integer k
.
Question: Does there exist a collection of ≤k
of these sets whose union is equal to U
?
Logic & Satisfiability
a literal
Either a propositional variable (e.g. x
) or its negation (e.g. ¬x
).
Logic & Satisfiability
a clause
A disjunction of literals, i.e.Cⱼ
= l₁ ∨ l₂ ∨ ... ∨ lᵣ
Logic & Satisfiability
Conjunctive normal form
ɸ is in conjunctive normal form if it is a conjunction of clauses, i.e.:
C₁ ∧ C₂ ∧ ... ∧ Cₘ
Satisfiability
- Φ is a set of clauses, and each clause is a set of literals
- V(Φ) is the set of variables of Φ
The aim is to decide whether Φ is satisfiable, i.e. whether there is an assignment τ : V(Φ) → {0, 1} satisfying Φ.
The assignment τ satisfies Φ if every clause of Φ contains either:
- a variable with τ(x) = 1 or
- a literal ¬x with τ(x) = 0.
3-CNF formula
A CNF-formula that has exactly 3 literals per clause.
NP: Polynomial time certifier
Certifier
Algorithm C(s, t)
is a certifier for problem X
if every string s
satisfies s ∈ X
iff there exists a string t
such that C(s, t)
= yes.
The class NP
Decision problems for which there exists a polynomial time certifier.
Intuition:
- P ≈ problems that can be solved in polynomial time.
- NP ≈ problems for which a (polynomial-sized) solution can be verified in polynomial time.
NP-complete
A problem Y
in NP with the property that for every problem X
in NP, X ≤ pY
.
NP-complete problems are the hardest problems in NP.
Combinatorial optimization
Combinatorial optimization is about finding a subset of elements that:
- satisfies certain constraints and
- is optimal with respect to a given cost function, e.g. smallest weight, shortest path, etc.
6 Examples of combinatorial optimization problems
- Minimum spanning tree in a graph
- Shortest path in a graph
- Minimum vertex or set cover
- Maximum independent set
- Minimum vertex coloring
- Knapsack
Coping with NP-completeness
Impossible trinity
Sacrifice one of 3 desired features:
- Solve the problem to optimality (heuristic & approximation algorithms)
- Solve problem in polynomial time → algorithms with exponential runtime (e.g. branch and bound and parameterized algorithms)
- Solve arbitrary instances of the problem → identify special cases solvable in polynomial time.
Branch and bound
Restrict the search space (of a systematic enumeration of all solutions) with the help of lower bounds and upper bounds and provide an optimal solution.
Still require exponential time in worst case, but can be much more efficient for practical instances.
Knapsack problem
Input: n
items with positive rational weights w₁, ..., wₙ
and values v₁, ... vₙ
as well as a positive rational capacity C
.
Task: Find a subset S
of items with weight ≤ C
that maximizes the total value.
Maximization: Approach
- The problem is divided into subproblems, e.g. by fixing variables or adding additional constraints. This leads to a partitioning of the solution space.
-
If the calculated upper bound
U'
is not larger than the best current lower boundL
(= the value of the current best solution) for one (or more) of these subproblems (subsets), then one no longer needs to consider solutions for this subproblem. - Otherwise one needs to further divide the current subproblem (subset of solutions).
- One continues to divide the solution-space until the upper bound is no longer larger than the current best lower bound for all current subproblems.
Maximization Approach: Suitability & Efficiency
- Branch and bound is a general principal (meta-approach)
- It can be applied to a wide variety of discrete optimization problems.
- Crucial for its efficiency are:
- The choice of heuristics for the computation of
U'
andL'
. - How the branching is accomplished.
- The order in which the subinstances are considered.
- The choice of heuristics for the computation of
Branch and bound:
Choice of the next subproblem
- The choice of the next subproblem does not change the general functionality and correctness of branch and bound.
- Nevertheless, the choice can have a huge impact on practical running-time.
Branch and bound: choice of problems
Best-first vs Depth-first
Best-first:
- Always choose a subproblem with the best dual bound (upper bound).
- This way we always process the smallest number of subproblems.
Depth-first:
- Always continue with a most recently created subproblem (similar to depth-first search for searching graphs).
- Leads to a first valid solution very quickly.
- One can also combine an initial depth-first search strategy with the best-first search strategy that combines after an initial solution has been found.
Independent Set Problem
Given a graph G
and an integer K
, does G
, contain an independent set of size K
?
Prove that the Independent Set Problem is in NP
Proof:
A solution to the independent set problem is a subset V'
of the vertices of G
.
Let n
be the number of vertices of G
and m
be the number of edges.
Given V'
, we first verify that it contains K
elements.
For each pair u, v ∈ V'
, check that (u, v) ∉ E
.
There are O(n²)
pairs of vertices in V'
, so this takes O(mn²)
time, which is polynomial in n
and m
.
3-SAT Problem
Given a Boolean expression in 3-CNF, can it be satisfied?
Reduce 3-SAT to the Independent Set Problem in polynomial time.
Proof:
Let φ be a Boolean expression in 3-CNF with n
variables and m
clauses.
Construct G
and k
as follows:
-
G
contains3
vertices for each clause of φ (one for each literal). - Connect the three literals in each clause to a triangle.
- Connect each literal with all its negations.
- Set
k=m
.
φ is satisfiable if and only if G
has an independent set of size at least k
Prove Correctness:
φ is satisfiable if and only if G
has an independent set of size at least k
Recall: k
is set = m
, the number of clauses in φ
Proof:
Let S
be an independent set of size k=m
.
- S
contains exactly one vertex per triangle (at most one because S
is independent, and at least one because |S| = k
)
- Moreover, the edges between complementary literals inforce that S
contains at most one literal, i.e. x
or ¬x
, from any variable.
2 Ways to picture a 3-SAT instance
- You have to make an independent 0/1 decision for each of the
n
variables, and you succeed if you manage to achieve one of three ways of satisfying each clause. - You have to choose one term from each clause, and then find a truth assignment that causes all these terms to evaluate to 1, thereby satisfying all clauses. You succeed if you can select a term from each clause in such a way that no two selected terms “conflict”.
We say that two terms conflict if one is equal to a varialbe xᵢ, and the other is equal to its negation ¬xᵢ.
Hamiltonian Cycle
Given a directed graph G = (V, E)
, we say that a cycle C
in G
is a Hamiltonian cycle if it visits each vertex exactly once.
I.e. it constitutes a “tour” of all the vertices, with no repetitions.
Hamiltonian Path
Given a directed graph G = (V, E)
, we say that a path P
in G
is a Hamiltonian path if it contains each vertex exactly once.
(The path is allowed to start at any node and end at any node).
6 Genres of NP-complete problems
- Packing
- Covering
- Partitioning
- Sequencing
- Numerical
- Constraint Satisfaction
6 Genres of NP-complete problems
Packing problems
You’re given a collection of objects. You want to choose at least k
of them.
Making your life difficult is a set of conflicts among the objects, preventing you from choosing certain groups simultaneously.
E.g.
- Independent Set. Given a graph G
and a number k
, does G
contain an independent set of size at least k
?
- Set Packing. Given a set U
of n
elements, a collection S₁, ..., Sₘ
of subsets of U
and a number k
, does there exist a collection of at least k
of these sets with the property that no two of them intersect?
6 Genres of NP-complete problems
Covering problems
Covering problems form a natural contrast to packing problems.
You’re given a collection of objects, and you want to choose a subset that collectively achieves a certain goal. The challenge is to achieve this goal while only choosing k
of the objects.
E.g.
- Vertex Cover. Given a graph G
and a number k
, does G
contain a vertex cover of size at most k
?
- Set Cover. Given a set U
of n
elements, a collection S₁, ..., Sₘ
of subsets of U
, and a number k
, does there exist a collection of at most k
sets whose union is equal to all of U
?
6 Genres of NP-complete problems
Partitioning problems
Involve a search over all ways to divide up a collection of objects into subsets so that each object appears in exactly one of the subsets.
-
3-Dimensional Matching. Given disjoint sets X, Y, and Z, each of size
n
and given a set T ⊆ X ⤬ Y ⤬ Z of ordered triples, does there exist a set ofn
triples in T so that each element of X ⋃ Y ⋃ Z is contained in exactly one of these triples. -
Graph Coloring. Given a graph
G
and a boundk
. DoesG
have ak
-colouring?
6 Genres of NP-complete problems
Sequencing Problems
Involves searching over the set of all permutations of a collection of objects.
E.g.
- Hamiltonian Cycle.
- Hamiltonian Path.
- Travelling Salesman. Given a set of distances on n
cities and a bound D
, is there a tour of length at most D
?