Final Flashcards
Function? x,y:
n: 1,2,3…
time: e1,e2,e3…
linear-log plot straight line
Complexity, exponential
y = a.b^{x} - O(a.b^{x})
log y = log(a) + x.log(b)
^y = ^a + ^b.x
Function? x,y:
h: e1,e2,e3…
error: e1,e2,e3…
log-log plot straight line
Error, power function y = ax^b - O(ax^b) log y = log(a) b.log(x) ^y = ^a + b.^x Algebraic convergence (n^-a) / growth (n^a)
Pseudo random numbers
o Random pattern o Long period
o Efficiency
o Repeatability
o Portability
Monte Carlo Methods
- Approximation based on repeated randomized sampling
- Nondeterministic problems / complicated or high dimensions)
- Slow convergence of error O(n^{-.5}), but efficiency does not degrade with dimension of pb
Relative error bound
|x-^x|/|x| ≤ 10^{-n+1} if n significant digits
Taylor Series approximation x0 of degree n
^f(x) = sum(i=0:n) f^(i)(x0)/i! (x-x0)^i
Taylor error of degree n
O(h^{n+1}) where h=(x-x0)
Indeed R = f(x)-^f(x) = sum(i=n+1:∞) f^(i)(x0)/i! (x-x0)^i ≤ f^(n+1)(ξ)/(n+1)! h^{n+1}
Normalized floating point numbers representation
x = ± 1.b1b2…bn ⨉ 2^m
m in [L,U]
precision p = n+1
UFL, OFL, ϵm
UFL 2^L
OFL 2^{U+1}(1-2^{-p})
ϵm = distance 1/next = 2^-n
Subnormal floating point numbers representation
m = L, leading digit = 0
Hence smallest = 2^-n.2^L
Roundoff error
erel = |fl(x)-x|/|x| ≤ ϵm abs = |fl(x)-x| ≤ ϵm.|x|
Single/double precision ϵm
Single: ϵm = 2^{-23} ≈ 1e-7
Double: ϵm = 2^{-52} ≈ 1e-16
Loss of significance
Substraction/rounding errors, leads to loss of significant digits.
1.000001000 - 1.000000000 with 10 sig digits gives = 1.000000000 with 4 sig digits Divisions work better!
Vector norm properties
||x|| > 0 iff x ≠ 0
||ax|| = |a| ||x|| for all scalars a
||x+y|| ≤ ||x|| + ||y||
Matrix norm properties
||A|| > 0 iff A ≠ 0 ||aA|| = |a| ||A|| for all scalars a ||A+B|| ≤ ||A|| + ||B|| \+ submultiplicativity: ||Ax|| ≤ ||A|| ||x|| ||AB|| ≤ ||A|| ||B||
||v||_p, ||v||_1, ||v||2, ||v||∞
||v||_p = (|v1|^p + ... + |vn|^p)^{1/p} ||v||_1 = |v1| + ... + |vn| ||v||_2 = √(v1^2 + ... vn^2) ||v||_∞ = max(i) |vi|
||A||, ||A||1, ||A||∞, ||A||_2
||A|| = max(||x||=1) ||Ax|| ||A|| = max(||x||) ||Ax||/||x|| ||A||_1 = max |column sum| ||A||_∞ = max |row sum| ||A||_2 = max σi
Linear system of equations LU solve?
Factorize A = LU O(n^3)
LUx = b (forward sub Ly=b O(n^2))
Ux = y (backward sub O(n^2))
L with ones diagonal
Linear system of equations partial pivoting?
Avoids division by zero: find largest |entry| in the column and swap it with top row. If not possible anyway, singular matrix.
A = PLU
PLUx = b
Solve y: Ly=P^Tb
Solve x: Ux = y
(a11 a12 a21 A22) = ( u11 u12 u11.l21 l21.u12+L22U22) l21 = a21/u11
Solving KU=F when K(100,100), LU factorization ≈ 1 sec and each solve (forward + backward substitution) ≈ 0.01 sec How long find 10 different U when (1000,1000)?
factorisation x=(1000/100)^3 = 1e3 sec
solve y=(1000/100)^2 0.01 = 1 sec
total = 1e3 + 10 ≈ 1e3 sec
Condition number
Characterizes error amplification.
cond(A) = ||A^-1||.||A||
||Δx||/||x|| ≤ cond(A) ||Δb||/||b||
||Δx||/||x|| ≤ cond(A) ||E||/||A||
Condition number relative residual
r = b-A^x
For well-conditioned matrices, small r implies small relative error
||Δx||/||x|| ≤ cond(A) ||r||/(||A||.||x||)
Rule of the thumb accuracy cond(A)
A and b accurate to S decimal digits, cond(A)=10^W, solution will have (S-W) accurate digits (IEEE double = 16 digits)
LU partial pivoting guarantee?
||r||/(||A||.||x||) ≤ c.ϵm (small relative residual, NOT relative error) regardless of conditioning of the system
Markov chain
xn = A x_{n-1}
A transition matrix, each entry non-negative probability, columns sum to 1, “from” top “to” left.
COO format
row = [i0 i1 ... ] (nnz int) col = [i0 i1 ... ] (nnz int) data = [ # # ... ] (nnz double)
CSR format
rowptr = [i0 i1 ... nnz] (nrow+1 int) col = [i0 i1 ... ] (nzz int) data = [# # ...] (nzz double)
CSR example 0 0 1.3 -1.5 .2 0 5 0 0 0 0 0 0 0.3 3
rowptr = [0 1 3 4 4 6] col = [2 0 1 0 1 2] data = [1.3 -1.5 .2 5 .3 3]
Eigenvalue
A nxn, x≠0 eigenvector, λ eigenvalue: Ax = λx
A = UDU^{-1} diagonalizable, n linearly independent eigenvectors (U = [u1 u2 … ])
Power iteration
finds the largest λ1 xk+1 = A xk (normalize xk) xk = λ1^k [⍺1.u1 + ⍺2(λ2/λ1)^k.u2 + ... + ⍺n(λn/λ1)^k.un] xk converges to ⍺1.u1 cost per iteration O(n^2)
Power iteration convergence
ek+1/ek ≈ |λ2|/|λ1| (linear)
Faster when λ2/λ1 is small
Rayleigh coefficient
λ = x.T@A@x / x.T@x
Inverse power iteration
finds the smallest λn xk+1 = A^-1 xk Factorize LUxk+1 = Pxk solve Ly = Pxk solve Uxk+1 = y O(n^2) per iteration
Inverse power iteration convergence
ek+1/ek ≈ |λn|/|λn-1| (linear)
Faster when λn/λn-1 is small
Shifted Inverse power iteration
finds λ close to σ xk+1 = (A-σI)^{-1} xk B = A - σI Factorize LUxk+1 = Pxk Solve Ly = Pxk Solve Uxk+1 = y O(n^2) per iteration
Shifted Inverse power iteration convergence
ek+1/ek ≈ |λc1-σ|/|λc2-σ| (linear) where λc1 closest to σ, λc2 2nd closest to σ Faster when (λc1-σ)/(λc2-σ) is small
Rayleigh Quotient Iteration
Shifted Inverse power iteration + update of σ.
Bk = A - σk@I
Compute xk+1…
Update σk = xk+1.T@A@xk+1 / xk+1.T@xk+1
At least quadratic convergence, cost O(n^3) per iteration
SVD
A = UΣV.T O(n^3) - col V eigenvec of A.T@A - col U eigenvec of A@A.T - σi^2 eigenval A.T@A (sing val) Singular values are always positive! (A.T@A positive definite) mxn = mxm mxn nxn
Reduced SVD
mxn = mxk kxk kxn k = min(m,n)
Low-rank approximation / error
A = Σ(i=1:k) σk@uk@vk.T error = ||A - Ak||_2 = σk+1
Linear least squares pb
Given m data points, find n coeff (n < m) of the function f(t) that best fits the data (overdetermined system)
Linear least squares system
Ax ≈ b, min_x ||Ax-b||_2^2 A vandermonde matrix e.g. f(t) = x0+x1t+x2t^2: (1 t1 t1^2 1 t2 t2^2...)
Linear least squares normal equations
A.T@A@x = A.T@b
factorization n^3, solving n^2
can be ill-conditioned, solution only if rank(A)=n
Linear least squares SVD
x = V@Σ^+@U.T@b factorization n^3, solving mn Always solution (unique with Σ^-1 if rank(A)=n)
Interpolation
m data points, find f(t) = Σ(j=0:n-1) xj.ɸj(t), n linearly Independent basis functions Ax = b Generalized vandermonde matrix mxn: [ɸ0(t0) ɸ1(t0) ... ɸn-1(t0) ɸ0(t1) ... ɸn-1(tm-1)]
Interpolation monomials
p{n-1}(t) = Σ(j=0:n-1) xj.t^j (degree n-1)
Interpolation monomials error
Points equally spaced on interval length h, function sufficiently smooth, error of the interpolant of degree n-1 is O(h^n) = O(h^{degree+1})
Nonlinear equation 1D solve f(x)=0
- Bisection method (no derivatives, one eval per it, linear convergence hk+1 = .5hk)
- Newton’s method (xk+1 = xk - f(xk)/f’(xk), 2 eval per it, quadratic convergence)
- Secant method (2 initial guesses, ^f’(xk) = (f(xk)-f(x{k-1}))/(xk-x{k-1}), no need for derivatives, one eval per it, superlinear convergence)
System nonlinear equations solve f(x)=0
- Newton’s method (solve J(xk)sk = -f(xk) where J(xk)ij = ∂fi/∂xj, xk+1 = xk + sk, typically quadratic local convergence, costly)
- Secant method (Broyden) approximate jacobian s.t. ^J(xk+1 - xk) = f(xk+1) - f(xk)
- Finite difference method (approximate partial derivatives)
Optimization 1D
- 1st ord necess cdt: f’(x)=0
- 2nd ord suff cdt: f’‘(x) > 0
- Golden section search (guaranteed to converge to global min for unimodal fct, no derivative, 1 eval per it, hk+1 = 0.618 hk linear convergence)
- Newton’s method (xk+1 = xk-f’(xk)/f’‘(xk), typically quadratic local convergence, 2 eval per it, may fail to converge/or max/saddle…)
Optimization ND
- 1st ord necess cdt: ∇f(x)=0
- 2nd ord suff cdt: Hf(x) pd (nd max, indefinite saddle…)
- Steepest descent (sk = -∇f(xk), ⍺k=argmin_⍺(f(xk+⍺.sk)), xk+1 = xk + ⍺k.sk, linear convergence)
- Newton’s method (solve Hf(xk)@sk = -∇f(xk), xk+1 = xk + sk, typically quadratic local conv, need Hessian, high comp cost)