Probabilistic algorithms Flashcards
heuristics
t
approximation algorithm
t
randomized algorithm
t
stochastic
describes something that was randomly determined
randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.
Probability density function
a probability density function (PDF), or density of a continuous random variable, is a function, whose value at any given point in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample
pivot
t
Probabilistic Algorithm paradigm
for each input make a random choice and hope that it is good.
Non-Deterministic execution of algorithm
for the same input different sequences of
operations are possible
vs. for the same input the same sequence of
operations is executed
Probabilistic Algorithm
A Probabilistic (or Randomized) Algorithm is an algorithm which employs a degree of randomness as part of its logic
- > During execution, it takes random choices depending on those random numbers
- > The behavior (output) can vary if the algorithm is run multiple times on the same input
=> probabilistic algorithms are often simpler and faster than their deterministic counterparts
=> however, in the worst execution case, the algorithm may be very slow
positive vs negative probability
??
probabilistic algorithm
+ gain execution time
- loose correctness of answer
Las Vegas algorithms
a randomized algorithm that
- always gives correct/optimal results
- the execution time is variable
- example: RandQSORT
Monte Carlo algorithms
a randomized algorithm that
- the running time is deterministic
- the output may be incorrect with a certain probability
- example: Check matrices equality
- for Decision problem (Yes/No): one- or two-sided error
one-sided error
Monte Carlo algorithm: Yes or No always correct
two-sided error
Monte Carlo algorithm: a non-zero probability to err for both outputs
E[X]
Expected value of discrete X: E[X]
The expected value of random variable X is often written as E(X)
sl. 15 lecture_01
Random number generation
- the kernel of Monte Carlo simulation
- the heart of many standard statistical methods (bootstrap, Bayesian analysis)
- cryptography
bisection method
d
sucessive quadratic estimation method
d
Powell’s algorithm
d
inflection point
An inflection point is a point on a curve at which the sign of the curvature (i.e., the concavity) changes. Inflection points may be stationary points, but are not local maxima or local minima.
http://mathworld.wolfram.com/InflectionPoint.html
first order condition
The first order condition means that the local maximum or local minimum of some continuous function within some range (not including the end points) must occur where the derivative of that function is equal to 0.
- > At the highest and lowest points of a curve, the tangent to the curve at such points is horizontal.
- > The slope of the curve is zero.
http://users.etown.edu/p/pauls/ec309/lectures/lec04_unconst.html
second order condition
SOC for a local max is that f’‘(x) < 0
SOC for a local min is that f’‘(x) > 0
loss function
a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the event.
An optimization problem seeks to minimize a loss function
Hessian matrix?
useful in characterizing the shape of L and to
distinguish, among the zeros of gradient g(d), the minimum points, the maximum points and the inflexion points
why an inverse matrix?
we can multiply by an Inverse, which achieves the same thing as “dividing by a number”
remember: with Matrices the order of multiplication matters. AB is almost never equal to BA.
https: //www.mathsisfun.com/algebra/matrix-inverse.html
Jacobian matrix
is the matrix of all first-order partial derivatives of a vector-valued function. When the matrix is a square matrix, both the matrix and its determinant are referred to as the Jacobian in literature
The Jacobian matrix is important because if the function f is differentiable at a point x (this is a slightly stronger condition than merely requiring that all partial derivatives exist there), then the Jacobian matrix defines a linear map ℝ^n → ℝ^m, which is the best (pointwise) linear approximation of the function f near the point x. This linear map is thus the generalization of the usual notion of derivative, and is called the derivative or the differential of f at x.
Differenzierbarkeit / differentiable
Als Differenzierbarkeit bezeichnet man in der Mathematik die Eigenschaft einer Funktion, sich lokal um einen Punkt in eindeutiger Weise linear approximieren zu lassen.
TSP
find a Hamiltonian cycle with minimal path length in G
TSP extension
- Asymmetric TSP
- optimum configuration for the ATSP
- time dependent TSP
- Vehicle Routing Problem
- ## probabilistic VRP
Branch & Bound
an algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solutions by means of state space search: the set of candidate solutions is thought of as forming a rooted tree with the full set at the root. The algorithm explores branches of this tree, which represent subsets of the solution set. Before enumerating the candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on the optimal solution, and is discarded if it cannot produce a better solution than the best one found so far by the algorithm.
Branch & Bound TSP
in B & B we try to reduce the search space:
linear programming
LP, also called linear optimization
best outcome of a mathematical model (provided constraints)
convex set
a convex set is a subset of an affine space that is closed under convex combinations.
in a Euclidean space, a convex region is a region where, for every pair of points within the region, every point on the straight line segment that joins the pair of points is also within the region
Convex polytope
a special case of a polytope, having the additional property that it is also a convex set of points in the n-dimensional space R^n
Steepest Descent algorithm
tbd
Newton-Raphson
tbd
Branch & Bound
tbd
Localized Random Search
tbd
Direct Random Search
tbd
convex
convex describes a surface that curves outward, or is thicker in the middle than on the edges.
https://writingexplained.org/concave-vs-convex-difference
concave
Concave is an adjective that describes a surface that curves inward, or is thinner in the middle than on the edges.
hypercube
a hypercube is an n-dimensional analogue of a square (n = 2) and a cube (n = 3)
multivariate
involving two or more variable quantities.
multivariate normal distribution
is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions
One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution
Enhanced Localized Random Search
tbd
Simple Random (“Blind”) Search
tbd
Direct methods for stochastic search
enlist…
Pattern search
describe…
Random search with noise-free measurements
describe…
slack variable
a slack variable is a variable that is added to an inequality constraint to transform it into an equality
Introducing a slack variable replaces an inequality constraint with an equality constraint and a non negativity constraint
By introducing the slack variable y >= 0, the inequality Ax <= b can be converted to the equation Ax + y = b
n-simplex
is a generalization of the notion of a triangle to arbitrary dimensions
Basic Pattern Search Algorithm
t
Compass search
t
Nonlinear Simplex (Nelder-Mead) Algorithm
t
- derivative-free optimization algorithm -> derivatives are unavailable or expensive to compute
- > derivative-free method
- > recommended for solving optimization problems with noisy objective functions
convex hull
a polygon which is the smallest perimeter fence enclosing a set of N points
Constraints
limitation of the allowed values of the degree of
freedom of the system
e.g. boundaries (localisation constraints)
e.g. deadline for event occurrence (temporal
constraints)
hard & soft constraints
penalty function
a solution which violates a constraint is not rejected, but “punished”
heuristics
is any approach to problem solving, learning, or discovery that employs a practical method not guaranteed to be optimal or perfect, but sufficient for the immediate goals.
Heuristic optimization algorithms
Two natural ways to design an heuristic algorithm:
- Construction heuristics
- Markovian improvement heuristics
configuration
t
construction heuristics vs. improvement heuristics
t
Insertion heuristic
the new node is inserted somewhere inside the sequence (not appended!)
- How to choose the new node?
- Where is the optimum place to insert it?
Construction heuristics for TSP
e.g. Nearest Neighbor Heuristic
Best insertion heuristic
you insert the next randomly chosen node between those two nodes where the increase in the total length is minimum
until all nodes are visited.
Nearest Neighbor Heuristic
choose a random node. then choose a random node which is closest to the first node. Then choose a random node which is closest to the second node and so on… (the new node is appended!)
Best-Best insertion / cheapest insertion heuristic
t
Worst-best insertion / farthest insertion algorithm
t
nearest insertion heuristic
t
Shortest edge heuristic
t
Ruin & Recreate
t
Ruin & Recreate for TSP
t
energy landscape
The values of loss function + the search space form the energy landscape of the system
improvement heuristics
t
Markov improvement heuristic
t
local search approach
?
Swap move algorithm TSP
t
Translation move algorithm
t
Inversion move algorithm
t
Improvement heuristics for TSP
Swap move algorithm TSP
Translation move algorithm
Inversion move algorithm
Greedy algorithm for TSP
t
Simulated Annealing
t
TA is considered as a “first approximation” of SA
much faster than SA (no need to generate delta uniformly on (0,1))
i.i.d.
independent and identically distributed
a sequence or other collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent
Noise:
independent, identically distributed (i.i.d.) measurement
errors
Steepest Descent with Noisy/Noise-Free Gradient
_09
Random Search Algorithms with Noisy Loss Function
Basic implementation of random search assumes perfect (noise-free)
values of L
Statistical and Engineering Convergence Conditions
_09
Stochastic Approximation
_09 ???
stochastic gradient
_09 sl. 31
Stochastic Gradient Measurement and Algorithm (SGA=
_09
Gradient-Free Algorithms
_09
Finite-Difference Algorithm
_09
FDSA algorithm
_09
gain value?
_09
Simultaneous Perturbation Stochastic Approximation
Algorithm (SPSA)
_09
gain sequence
the sequence of a_k values used in the calculation of d^_k+1, as a factor to Y_k(d_k)
d*
d* e D* -> a global optimal solution
D* is the solution space, the set of all solutions of an optimization problem
d^ e D (problem space) is a solution candidate
The loss function L()
The gradient g(d)
The gradient g(d) of L(d) is the vector of first order derivatives
SP
Simultaneous Perturbation