CS6515_Exam3 Flashcards
LP (Linear Programming) is in P? (T/F)
True
ILP (Integer Linear Programming) is in P? (T/F)
False. It is known to be NP-Complete
NP-Complete problems are the hardest problems in what class? What assumption are we making?
NP, ie the outer layer of the donut. The assumption we’re making is that P != NP, because in that case all NP-Complete problems are not in P.
In some sense, you can think of NP-Complete problems as the “essence” of the class NP.
If a NP-complete problem can be solved in polytime then what does this mean?
Then ALL problems in NP can be solved in polytime.
Describe the P vs. NP question? What is the historical background of it?
Note: not necessarily important for exam, but useful information to help contextualize P vs. NP.
In the 1960-70s scientists began realizing that certain problems had efficient algorithms for solving them, but many other problems did not seem to have such an algorithm. Many of these thorny latter problems had important practical applications, from vehicle routing, network design, etc. Scientists began sorting the problems based on their “hardness”. One thing scientists noticed is that while some of these hard problems did not appear to have polytime algorithms for SOLVING, a given solution could still be checked quickly (ie in polytime). This came to be the class NP.
While most modern researchers think that P != NP, a decisive answer to the question has been incredibly elusive. Doubts arise because sometimes a problem that was thought to be in NP turns out to be in P after someone devises a clever polytime algorithm (eg, finding primes, etc.). THIS IS REALLY YHE CORE QUESTION OF P vs NP: Will we eventually find polytime algorithms for all NP problems, in which case P = NP, or is it just the case that some really hard problems simply don’t have efficient algorithms for solving?
Previous ponderings and correspondence by people like Nash, Godel, and Von Neumann in the 1950s hinted that a problem was afoot. More formal descriptions following. In 1965, Cobham and Edmunds appear to be the first to identify the class of P problems [1][2], ie ones that can be solved in polynomial time. Edmunds published a paper (also in 1965) that provided a template for what would later become the class NP.
The formal theory of NP-Completeness is generally traced back to a few individuals (some working independently) in the early 1970s, namely Stephen Cook, Richard Karp, and Leonid Levin.
Sources:
[1] A. Cobham. The intrinsic computational difficulty of functions. In Y. BarHillel, editor, Proc. 1964 International Congress for Logic Methodology and
Philosophy of Science, pages 24–30, Amsterdam, 1964. North Holland.
[2] J. Edmonds. Paths, trees, and flowers. Canad. J. Math, 17:449–467, 1965.
[3] J. Edmonds. Minimum partition of a matroid into independent subsets.
J. Res. Nat. Bur. Standards Sect. B, 69:67–72, 1965
[4] S. Cook. The complexity of theorem proving procedures. In Proc. 3rd
Ann. ACM Symp. on Theory of Computing, pages 151–158, New York,
1971. Association for Computing Machinery
Describe the different between the following problems in terms of computational complexity:
- Multiplication and sorting
- Sudoku
- The game of Chess or Go
- These are examples of problems that are in the class P because we have very good fast algorithms for solving them, eg Karatsuba algorithm for multiplication, merge-sort for sorting, etc.
- Sudoku is a classic example of an NP problem (it is also NP-Complete). If I give you a solved Sudoku grid, it’s trivial for you to check if it is correct (or at least trivial for a computer). However, solving a Sudoku grid is hard, and it gets exponentially harder as the grid gets bigger. There is no known algorithm for solving Sudoku puzzles in polytime.
- Chess or Go have massive possible state spaces. Solving a game like this is an example that goes outside NP, to the realm of EXPTIME algorithms. No modern superfast computer (or even one we could reasonable expect to develop in the near future) could solve problems like these, even if we let them run for thousands of years. Even just checking whether a move in chess is the best takes exponential time. We’re basically helpless (from a computational perspective) when it comes to problems like these.
What is “Co-NP”
Similar to NP, but instead of it being easy to check a correct solution, Co-NP problems are easy to EXCLUDE WRONG ANSWERS
Define a “search problem”.
Hint: Specify the form of the problem, and then the requirement that has to be met
Form:
Given instance T
- find a solution S for T if one exists
- output ‘NO’ if T has no solutions
Requirement (to be a search problem):
- if given an instance T and solution S, then we can verify that S is a solution to T in polytime (ie polynomial in the size |T|)
How do we show that a problem is in NP?
We show that there is a polytime algorithm for checking solutions to the problem
SAT is in NP? (T/F) Why or why not?
True. Because we can check solutions to SAT in polynomial time. There are worst case n variables to check in for each clause, and there are m clauses, so it takes O(nm) time to verify a solution, which is polynomial.
The k-Colorings problem is not in NP? (T/F) Why or why not?
False, it is in NP because we can verify solutions in polytime. All we have to do is iterate over all the edges in the graph and verify that color(u) != color(v), which takes O(m) time, ie polynomial.
MST is in NP? (T/F) Why or why not?
True, because we can verify solutions in polytime. The steps we take to prove this are as follows:
- Verify the output is of proper form (ie MST is a search problem): there will always be some tree output by MST; a ‘NO’ instance isn’t possible (although it would still be of the correct form if it was possible). So yes, the output is of the proper form.
- Verify that the output is a tree: simply run DFS/BFS and make sure that no cycles are present
- Run Kruskal’s/Prim’s algorithm to check that tree T has minimum weight, which takes polytime.
MST is not in P? (T/F) Why or why not?
False, it is in NP because we can verify solutions in polytime. The steps for this proof are:
- Verify the output is of proper form (ie MST is a search problem): there will always be some tree output by MST; a ‘NO’ instance isn’t possible (although it would still be of the correct form if it was possible). So yes, the output is of the proper form.
- We can find a tree of minimum weight in polytime using Kruskal’s or Prim’s algorithm
Knapsack is in NP? (T/F) Why or why not?
False, because we can’t verify that a solution is a maximum in polynomial time. This is because of the pseudo-polynomial nature of knapsack that we discussed in previous lectures. The budget B is a MAGNITUDE; it isn’t bounded to the input size, it can be an arbitrarily large number.
Knapsack is definitely NOT in NP? (T/F) Why or why not?
False. This may seem to contradict a previous question about whether knapsack is in NP (which is also False). We aren’t actually violating the “Law of the Excluded Middle” though; the reason the answer here is False is an epistemic problem: we simply don’t know whether knapsack is in NP or not, so we can’t conclusively say one way or the other currently.
As best as we know right now, Knapsack is not known to be in NP? (T/F)
This is True. This admits the current epistemic gap that exists, namely, that we don’t know for sure if Knapsack is outside NP.
How do we modify knapsack so that it falls under NP? What is the consequence of this modification?
We get rid of the ‘max’ operation and instead set a goal value that we want to be greater than. The consequence is that this turns the original Knapsack problem into a Search Problem, for which we DO have a polytime algorithm for checking solutions to, which makes this Knapsack Variant fall into the class NP.
If any NP-complete problem is solvable in P, then every problem in NP is solvable in P? (T/F)
True.
“Can we say anything general about when a computational complexity perspective is helpful in philosophy, and when it isn’t?”
Source:
https://arxiv.org/abs/1108.1791
Note: not necessarily important for exam, but useful information to help contextualize P vs. NP.
“Extrapolating from the examples in this essay, I would say that computational complexity tends to be helpful when we want to know whether a particular fact does any explanatory work : Sections 3.2, 3.3, 4, 6, and 7 all provided examples of this.”
Source:
“Why Philosophers Should Care About Computational Complexity”. Scott Aaronson. 2011. https://arxiv.org/abs/1108.1791
NP stands for “Non-Polynomial”? (T/F)
False. It stands for “Non-Deterministic Polynomial Time”.
The explanation in the lectures is that “non-deterministic” as used in this context means something like “allowed to guess at each step”. I like to think of it more as a “branching” of possibilities, and not committing to any single one; all that is required is that there exists >= 1 branch that is an accepting/terminating state.
P is a subset of NP? (True/False)
True. Note that this is True (although in a sort of degenerate sense) even if it turns out P = NP, because that would just mean that NP consists of the empty set.
Assuming P != NP, NP-Complete problems are the hardest problems in the class NP? (T/F)
True
What does it mean for SAT to be NP-Complete?
(a) SAT is in NP
(b) If we can solve SAT in polytime, then we can solve EVERY problem in NP in polytime, ie it would be the case that P = NP.
Let A=Colorings problem and B=SAT problem. What does the notation “A –> B” mean?
It means that we’re REDUCING A to B; ie if we can solve problem B in polytime, then we can use that algorithm to solve A in polytime.
When writing a proof that some problem B is NP-Complete, what are we saying?
We are saying that:
- B is in NP, ie a polytime algorithm exists for checking solutions
- For all A (ie KNOWN NP problems) in NP, we can reduce A –> B, ie if a polytime algorithm existed for us to solve B (the UNKNOWN problem), then we could use that as a subroutine to solve ALL any problem A in NP in polytime
3SAT is in NP? (T/F) Why or why not? If it is, how long does it take to verify solutions?
True, because we can verify solutions in polynomial time. Since each clause contains at most 3 literals, this means we can check each clause in O(1) time. There are m clauses, so total runtime is O(m), which is polynomial.
For 3SAT, If we have a clause of size k (where k > 3), how many new auxiliary variables do we create? How many new clauses?
k - 3 new variables, k - 2 new clauses
See NP3_9 end of video
For 3SAT, if we have a lengthy clause of size k (where k > 3) and want to reduce it to size 3, what auxiliary variables would we have to create? How many clauses would C’ contain? Are these variables distinct?
Create k -3 new variables y1, y2, … y_(k - 3). We would replace the original clause C with k - 2 clauses. Yes, these auxiliary variables would all be distinct.
What is the worst case number of variables we might add when taking a big clause C of length k (where k > 3) and convert it to an input to 3SAT?
We might have to create n new variables for m clauses, so order O(nm)
What is the pattern for converting a big clause into one acceptable for 3SAT?
See slide
For the FORWARD proof from C –> C’ for 3SAT, how should we set the variables? (Let a_i be min i where a_i is satisfied)
We take an assignment that satisfies C using a_1, a_2, … a_k
- Let a_i = True –> This satisfies the (i - 1)st clause of C’
- Set y_1 = y_2 = … y_(i - 2) = True –> This satisfies the 1st through (i - 2)st clause
- Set y_(i - 1) = y_i = … y_(k - 2) = False –> This satisfies the remaining clauses
For the REVERSE proof from C’ –> C for 3SAT, how should we set the variables? (Let a_i be min i where a_i is satisfied)
Take assignment to a_1, …, a_k, and y_1, …, y_(k - 3) satisfying C’
When proving the reverse claim for 3SAT, we can set a_1, a_2, …, a_k = F and still get a valid assignment? (T/F)
False. At least one of the literals has to be satisfied.
See NP2_12 for details. I still don’t quite get this one.
The red colored vertices below are an example of what? Why? How do we check for this feature?
They form an independent set. Definition: Subset S ⊂ V is an independent set if NO EDGES are contained in S.
To find an independent set, for all edges (x,y) in S, check to see if (x,y) in E, ie:
∀ (x,y) ∈ S, (x, y) ∉ E
In a graph G=(V,E), the empty set ∅ forms an independent set? (T/F) Why or why not?
True. Because the definition of an independent set is that for subset S ⊂ V, no edges (x, y) in S are in E.
In a graph G=(V, E), a singleton vertex does NOT form an independent set? (T/F) Why or why not?
False, it does form an independent set. This is because the definition of an independent set is that for subset S ⊂ V, no edges (x, y) in S are in E, and obviously a singleton vertex (trivially) satisfies this condition.
The max independent set problem is in NP? (T/F)
Trick question. We can’t really state whether or not something is definitely in NP or not, because until we have a proof for P=NP (or P != NP), we can’t make a definitive assignment. So the correct way to phrase this question would be: “The max independent set problem is known to be in NP?”
This is a subtle distinction, but Dr. Brito notes in the lectures that HW/Exam problems will be presented in the proper form “known to be”, so it’s probably a useful thing to remember in case they try to trip us up with a trick question like the one here.
The max independent set problem is known to be in NP? (T/F) Why or why not?
False, because we have no way of checking in polytime that a given solution is of maximum size. This is similar to the knapsack problem we encountered earlier.
How can we convert the max independent set problem into a problem that is in NP?
Similar to what we did for the knapsack problem, we get rid of the max operation and replace it with a goal g. This turns the optimization problem into a search problem.
Say you have a 3SAT clause C = (x_1 v !x_3 v x_2). How would you represent this as a graph G = (V, E). What is the max size of an independent set you could have for this graph?
We simply draw edges between x_1, !x_3, and x_2 to form a triangle. These are the clause edges. The max size IS would be a single vertex; if you added a second vertex, then one of the edges is guaranteed to be in E, violating the definition of IS.
This question is pretty pivotal for understanding reductions from 3SAT –> Independent Sets. See NP3_7 for more details on this.
In the context of 3SAT and graph algorithms, what is the difference between “clause edges” and “variable edges”?
Clause Edges: edges between variables inside a clause. For example, if you had the clause C = (x_1, !x_3, x_2), we simply draw edges between x_1, !x_3, and x_2 to form a triangle.
Variable Edges: edges between all x_i and !x_i; this prevents logical inconsistencies where we try to set, eg x_1 = T and x_1 = F, which obviously can’t happen
What does it mean to be “NP-Hard”? What is one consequence of this? How can we show that an NP-Hard problem is in NP-Complete?
It means the problem is at least as hard as everything in NP. One thing that means is we can reduce any problem in NP to a problem in NP-Hard, eg prove that Max-IS is in NP-Hard by applying the reduction from IS –> Max-IS.
To show that an NP-Hard problem is also in NP-Complete, we have to prove that we can check solutions for the problem in polytime. Remember that NP + NP-Hard = NP-Complete; equivalently, all problems in NP are in NP-Hard, but not all NP-Hard problems are in NP.
NP-Complete problems are the hardest problems in the set, and NP-Hard problems are at least as hard as everything in the set? (T/F)
True
The clique problem is only defined for directed graphs? (T/F)
False, it is only defined for undirected graphs.
What is the definition of a “clique” in a graph?
A clique is a fully connected subgraph, ie a subset of vertices is a clique if given subset S ⊂ V, ∀ (x, y) ∈ S, (x, y) ∈ E
Say you have 5 vertices on a graph. How many different pairs of vertices are there?
5 choose 2, so using formula in image below we have 5! / (2! * (5 - 2)!) = 10
What are the red vertices in the graph below an example of?
Definition: a clique is a fully connected subgraph, ie a subset of vertices is a clique if given subset S ⊂ V, ∀ (x, y) ∈ S, (x, y) ∈ E
Why couldn’t the vertex circled in blue in the graph below belong to the clique of red vertices?
Because it it is missing the edge shown in the image below that would make it fully connected with all of the red vertices.
What is the definition of a “Vertex Cover”?
A subset of vertices S ⊂ V is a vertex cover if it “covers every edge”, ie ∀ (x,y) ∈ E, either x ∈ S, y ∈ S, or (x, y) ∈ S
What is the graph in the image below an example of? Why?
Definition of Vertex Cover: S ⊂ V is a vertex cover if it “covers every edge”, ie ∀ (x,y) ∈ E, either x ∈ S, y ∈ S, or (x, y) ∈ S
What is the key idea behind the proof (the one shown in the lectures) for the Clique problem being NP-Complete?
That a clique is the opposite of an Independent Set
What is the key idea behind the proof (the one shown in the lectures) for the Vertex Cover problem being NP-Complete?
Key Idea of Proof: S is a VC ⇔ complement S’ is an IS
What are the inputs to a Max-Flow problem using LP?
Inputs: directed graph G=(V, E) with capacities c_e > 0 for e in E
LP: m variables: f_e for every e in E
Objective function: maximizing flow out of source vertex or maximizing flow in to sink vertex
Constraints:
Capacity constraints: ∀ e ∈ E, 0 <= f_e <= c_e
Conservation of flow: ∀ v ∈ V - {s,t}, sum_in(f_v) = sum_out(f_v) [slight abuse of notation here, but this is general idea]
Say you have the LP shown in the following image. From a geometric standpoint, what do the x_i constraints represent?
They form a half-plane.
The graph below shows a geometric representation of the constraints in a linear program. What does the brown shaded region represent?
The feasible region, ie the area where solutions exist that satisfy the constraints.
When using LP, the optimum solution will always lie on integer values? (T/F) Why or why not? What is an important consequence of this?
False. The feasible region represents a continuous function, ie it may very well find that the optimum is a non-integer value. The important consequence is that while LP is in P, Integer Linear Programming (ILP) is known to be NP-Complete.
In LP, the optimum will always lie at a vertex (ie corner) of the feasible region? (T/F) Why or why not?
True
In LP, the only optimal solutions are guaranteed to lie on a vertex of the feasible region? (T/F) Why or why not?
False. While it is true that a vertex at the maximum of the feasible region is guaranteed to be at least as good as any other optimal solution, it may not be the only one. Think about the case of a horizontal line that goes across the feasible region and through that vertex Any point along that line is also a solution. See the image below.
In geometric terms, what is one important consequence of the linear constraints we impose on an LP?
These linear constraints form half-planes. When we stack up all these linear constraints, they intersect to form a convex shape. This means that the feasible region is convex, so the optimum solution is guaranteed to fall somewhere on the convex hull of the shape. This is what enables algorithms like Simplex to be able to solve LPs in linear time.
What is the size of the constraint matrix A for an LP? What is the size of b (ie the inequality values)?
Matrix A is of size (M x N), where N is the number of variables x1, x2, …, x_N, and M is the number of constraints.
The size of b is (M x 1)
What is the standard form for an LP?
See image below.
How is an LP represented from a linear algebra approach? (Give sizes of matrices, objective function, constraints, etc.)
Variables: X –> (N x 1)
Objective function C –> (N x 1)
Constraint Matrix A –> (M x N)
Constraints B –> (M x 1)
Objective function: max[C.T @ X], where @ denotes matrix multiplication
Such that: A @ X <= B
Non-negativity constrains: X >= 0
For the LP below, what should the values for the objective function matrix C be? Constraint matrix A? Constraints b?
C = [[1], [6], [10]]
A = [[1, 0, 0], [0, 1, 0], [1, 3, 2], [0, 1, 3]]
b = [[300], [200], [1000], [500]]
Say you have the following objective function for an LP:
min [C.T @ X]
How would you convert this objective function into standard form?
Note: .T represents the matrix transpose operation and @ denotes matrix multiplication.
We turn it into a maximization problem by multiplying by -1 and then maximizing, ie:
min [C.T @ X] <==> max [-1 * C.T @ X]
Say you have an LP with a constraint like:
a1*x1 + a2*x2 >= b
How would you write this in standard form?
In standard form, we should always have the constraints such that they are <= b. We can do this by simply multiplying both sides of the inequality by -1, ie:
a1*x1 + a2*x2 >= b ⇐⇒ -a1*x1 + -a2*x2 <= -b
Say you have an LP that has an equality constraint like:
a1*x1 + a2*x2 = b
How would we represent this in standard form?
We simply use two inequalities, <= b and >= b, the latter of which we convert to standard form by multiplying both sides by -1:
a1*x1 + a2*x2 = b ⇐⇒ a1*x1 + a2*x2 <= b AND -a1*x1 + -a2*x2 <= -b
Say you have an LP with a strict inequality constraint like:
a1*x1 + a2*x2 < b
How would you represent this in standard form?
Trick question. Strict inequalities aren’t valid to use for an LP. This is because the feasible region must form a closed set. If we use a strict inequality, then it becomes an open set, and the problem is ill-defined.
Say you’re creating an LP and you want x to be able to take on an unconstrained range of values, ie you want x to be able to take on positive or negative values. How would you write this in standard form?
We would need to drop the non-negativity constraint and then add in two new variables, x_plus and x_minus. We would then add in two new constraints:
x_plus >= 0 and x_minus >= 0, (or in standard form: -x_plus <= 0 and -x_minus <= 0)
Then we would replace x with the following equation:
x = x_plus - x_minus