Modelling Flashcards
What is an optimisation problem?
An optimization problem is of the form
minimize f(x)
subject to x ∈ Ω
where f : R^n → R is a real-valued objective function and Ω ⊆ R^n is a set of constraints. If Ω = R^n, then the problem is an unconstrained optimization problem.
What is the feasible set?
The set F = {x ∈ Rn : f1(x) ≤ 0, . . . , fm(x) ≤ 0, g1(x) = 0, . . . , gp(x) = 0} is called the feasible set.
ie. the set of values x that satisfy the inequality and equality constraints.
What are global and local minimizers?
A vector x∗ satisfying the constraints is called a solution or a global minimizer of the optimisation problem, if f(x*) ≤ f(x) for all other x that satisfy the constraints.
Besides global minimizers, there are also other types of minimizers. A point x∗ ∈ Ω is a local minimizer, if there is an open neighbourhood U of x∗ such that f(x∗) ≤ f(x) for all x ∈ U ∩ Ω. A local minimizer is called strict, if f(x∗) < f(x) for all x ∈ U ∩ Ω, and isolated if it is the only local minimizer in a small enough
neighbourhood.
What is the Inner Product?
Essentially dot product.
⟨x, y⟩ = x · y = x⊺y = ∑i=1n xiyi.
What are the 3 important norms?
Three important examples of norms are the following:
1. The 1-norm: ∥x∥1 = ∑i=1n |xi|;
2. The 2-norm: ∥x∥2 = sqrt[∑i=1n xi2]
3. The ∞-norm: ∥x∥∞ = max1≤i≤n |xi|.
When is a matrix positive definite and semidefinite?
A symmetric matrix A is called positive semidefinite, written A ⪰ 0, if for all non-zero x ∈ R^n, x⊺Ax ≥ 0, and positive definite, written A ≻ 0, if x⊺Ax > 0 for all x ̸= 0. Equivalently, a symmetric matrix is positive semidefinite if all its eigenvalues are non-negative, and positive definite if they are all positive.
What is a Convex Set?
A set C ⊆ R^n is a convex set, if for all x, y ∈ C and λ ∈ [0, 1], λx + (1 − λ)y ∈ C. A compact (closed and bounded) convex set is called a convex body.
What do we know about the intersection, sum and scalar product of convex sets?
Let C, D ∈ C(R^n) be convex sets. Then the following are also convex.
* C ∩ D;
* C + D = {x + y | x ∈ C, y ∈ D};
* AC = {Ax | x ∈ C}, where A ∈ R^m×n
.
What is the Convex hull?
The convex hull conv S of a set S is the intersection of all convex sets containing S. Clearly, if S is convex, then S = conv S.
We have
conv S = {x ∈ R^n | x = ∑i=1k λixk, xi ∈ S, ∑i=1k λi = 1, λi ≥ 0}.
What can we tell about the distance between points in and not in a convex set?
Let C be a non-empty convex set and x ̸∈ C. Then there exists a point y ∈ C that minimizes the distance ∥x − y∥. Moreover, for all z ∈ C we have
⟨z − y, x − y⟩ ≤ 0.
In words, the vectors z − y and x − y form an obtuse angle.
How can we find a hyperplane between a point and convex set?
In what follows write intS for the interior of a set S.
Let C be a closed convex set and x ̸∈ C. Then there exists a hyperplane H such that
C ⊂ intH− and x ∈ intH+.
What is a Convex function?
Let S ⊆ R^n. A function f : S → R is called convex if S is convex and for all v, w ∈ S
and λ ∈ [0, 1], f(λv + (1 − λ)w) ≤ λf(v) + (1 − λ)f(w). The function f is called strictly convex if f(λv + (1 − λ)w) < λf(v) + (1 − λ)f(w). A function f is called concave, if −f is convex.
What is any local minimizer also in a convex function?
Let f : R^d → R be a convex function. Then any local minimizer of f is a global minimizer.
Proof.
Let v∗ be a local minimizer and assume that it is not a global minimizer. Then there exists a vector w ∈ R^d such that f(w) < f(v∗). Since f is convex, for any λ ∈ [0, 1] and v = λw + (1 − λ)v∗ we have
f(v) ≤ λf(w) + (1 − λ)f(v∗) < λf(v∗) + (1 − λ)f(v∗) = f(v∗).
This holds for all v on the line segment connecting w and v∗. Since every open neighbourhood U of v∗ contains a bit of this line segment, this means that every open neighbourhood U of v∗ contains an v ̸= v∗ such that f(v) ≤ f(v∗), in contradiction to the assumption that v∗ is a local minimizer. It follows that v∗ has to be a global minimizer.
How can we characterise convex functions using differentiability?
- Let f ∈ C1(R^d). Then f is convex if and only if for all v, w ∈ R^d,
f(w) ≥ f(v) + ∇f(v)⊺(w − v). - Let f ∈ C2(R^n). Then f is convex if and only if ∇2f(v) is positive semidefinite for all v. If ∇2f(v) is positive definite for all v, then f is strictly convex.
What are iterative algorithms?
Most modern optimization methods are iterative: they generate a sequence of points x0, x1, . . . in R^d in the hope that this sequences converges to a local or global minimizer x∗ of a function f(x). A typical rule for generating such a sequence is to start with a vector x0, chosen by an educated guess, and then for k ≥ 0, move from step k to k + 1 by
xk+1 = xk + αkpk, in a way that ensures that f(xk+1) ≤ f(xk). The parameter αk is called the step length (or learning rate in machine learning), while pk is the search direction.
What is Gradient Descent?
In the method of gradient descent, the search direction is chosen as
pk = −∇f(xk).
To see why this makes sense, let p be a direction and consider the Taylor expansion
f(xk + αp) = f(xk) + αp⊺∇f(xk) + O(α^2).
Considering this as a function of α, the rate of change in direction p at xk is the derivative of this function at α = 0,
df(xk + αp)/dα|α=0 = ⟨p, ∇f(xk)⟩,
also known as the directional derivative of f at xk in the direction p. This formula indicates that the rate of change is negative, and we have a descent direction, if ⟨p, ∇f(xk)⟩ < 0.
What are the different types of convergence?
Assume that a sequence of vectors {xk}, k ≥ 0, converges to x∗. Then the sequence
{xk}, k ≥ 0, is said to converge:
(a) linearly (or Q-linear, Q for quotient), if there exist an r ∈ (0, 1) such that for sufficiently large k,
∥xk+1 − x∗∥ ≤ r∥xk − x∗∥.
(b) superlinearly, if
limk→∞∥xk+1 − x∗∥/∥xk − x∗∥ = 0,
(c) with order p, with p > 1, if there exists a constant M > 0, such that for sufficiently large k,
∥xk+1 − x∗∥ ≤ M∥xk − x∗∥^p.
The case p = 2 is called quadratic convergence.
What is Newton’s Method?
Let f ∈ C^2(R^n) and let’s look again at the unconstrained problem
minimize f(x).
Newton’s method starts with a guess x0 and then proceeds to compute a sequence of points {xk}k≥0 in R^n by the rule
xk+1 = xk − ∇^2f(xk)^−1∇f(xk), k ≥ 0.
The algorithm stops when ∥xk+1 − xk∥ < ε for some predefined tolerance ε > 0. In the context of the general scheme xk+1 = xk + αkpk, the step length is αk = 1, and the search direction is the inverse of the Hessian multiplied with the negative gradient.
What neighbourhood can we find around a point where ∇^2 is positive definite at that point?
Let x∗ ∈ R^n be such that ∇2f(x∗) is positive definite. Then there exists an open neighbourhood U of x∗ such that for all x ∈ U, ∇2f(x) is positive definite.
When is a function Lipschitz continuous?
A function f : R^n → R^m is Lipschitz continuous on a domain Ω ⊆ R^n with respect to a pair of norms on R^n and R^m if there is a constant L > 0 such that for all x, y ∈ Ω,
∥f(x) − f(y)∥ ≤ L∥x − y∥.
The constant L is called the Lipschitz constant of the map.
In particular, the Hessian of a function f ∈ C^2(R^n), considered as a map from R^n
to R^n×n, is Lipschitz continuous with respect to a norm on R^n and the corresponding operator norm on R^n×n, if for any x, y we have
∥∇^2f(x) − ∇^2f(y)∥ ≤ L∥x − y∥.
When will Newton’s method have quadratic convergence?
Let f ∈ C^2(R^n) with Lipschitz continuous Hessian, and let x∗ ∈ R^n be a local minimizer with ∇f(x∗) = 0 and ∇2f(x∗) > 0. Then for x0 sufficiently close to x∗, Newton’s method has quadratic convergence.
What is a Linear Programming problem?
Linear programming is about problems of the form
maximize ⟨c, x⟩
subject to Ax ≤ b,
where A ∈ R^m×n, x ∈ R^n, c ∈ R^n, and b ∈ R^m, and the inequality sign means inequality in each row.
The feasible set is the set of all possible candidates,
F = {x ∈ R^n | Ax ≤ b}.
How can we check a linear programming problem has a solution?
A first important task is to find out if there exist any feasible solutions at all. That is, we would like to know if the system of linear inequalities Ax ≤ b has a solution. If F is not empty, we can certify that by producing a vector from F. To verify that the feasible set is empty is more tricky: we are asked to show that no vector lives in F. What we can try to do, however, is to show that the assumption of a solution to Ax ≤ b would lead to a contradiction.
Denote by a⊺i the rows of A, so that the entries of Ax are given by a⊺i x for 1 ≤ i ≤ m. Assuming x ∈ F, then given a vector 0 ̸= λ = (λ1, . . . , λm)⊺ with λi ≥ 0, the linear combination satisfies
⟨λ, Ax⟩ =
∑i=1m λia⊺ix ≤
∑i=1m λibi = ⟨λ, b⟩.
If we can find parameters λ such that the left-hand side is identically 0 and the right-hand side is strictly negative, then we have found a contradiction and can conclude that no x satisfies Ax ≤ b. A
condition that ensures this is
∑i=1m λiai = 0, ⟨λ, b⟩ < 0.
In matrix form,
∃λ ≥ 0, A⊺λ = 0, ⟨λ, b⟩ < 0.
What is a Polyhedron?
A polyhedron (plural: polyhedra) is a set defined as the solution of linear equalities and inequalities,
P = {x ∈ R^n | Ax ≤ b},
where A ∈ R^m×n, b ∈ R^m.