Smoothing Parameter Flashcards
The Interpolating Spline
-line passing through all data points -so minimises: Σ [yi - μi^]²
Smoothing Spline Definition
- given data: D = { (ti,yi), i=1,…,n} -with model: yi = f(ti) + εi , εi~N(0,σ²)
- where f(t) is smooth -given knot positions ti, i=1,…,n we can estimate f with smoothing spline fλ^(t) calculated using the matrix solution to the smoothing spline
Methods for Chossing the Optimal Lambda
1) training and test
2) cross-validation / ‘leave-one-out’
3) generalised cross-validation
Training and Test
1) partition indices I={1,…,n) into subsets I1 & I2 such that I1⋃I2=S
- this gives a training set: D1 = { (ti,yi), i∈I1} -and test set: D2 = { (ti,yi), i∈I2}
2) fit smoothing spline to D1 to find fλ,I1 for some specified λ
3) calculate the goodness of fit to D2: QI1:I2(λ) = Σ [yi - fλ,I1^(ti)]²
- sum over i in I2 and choose λ to minimise QI1:I2(λ)
Cross Validation / Leave One Out
-same as the training and test set but use only one observation in the test set:
–training set: D1 = D-j = {(ti,yi), i∈I-j}
–test set: D2 = (tj,yj) for given j
-then calculate:
Q-j:j(λ) = [yj - fλ,-j^(tj)]²
-repeat for each j∈{1,…,n} then average to form the ordinary cross-validation criterion:
Qovc(λ) = 1/n Σ [yj - fλ,-j^(tj)]²
Disadvantage of Cross Validation / Leave One Out
- very computationally intensive
- can write in terms of the smooting matrix instead
Matrix Form of the Smoothing Spline Description
-for given λ and index ν≥1 the fitted value fλ^(tk) at each knot tk may be written as a linear combination of observation y1,…,yn
Matrix Form of the Smoothing Spline Coefficients
-for a smoothing spline which minimises the penalised sum of squares for given λ has coefficients a^ and b^:
[a^ b^]^t = {Mλ}^(-1) [y 0]^t
Matrix Form of the Smoothing Spline f
f = [f1 … fn]^t = K b^ + L a^ = [K L] [Mλ(11) Mλ(21)]^t y
Matrix Form of the Smoothing Spline Smoothing Matrix
-can show:
S = Sλ = [K L] [Mλ(11) Mλ(21)]^t
=>
f = S y
-where S, the smoothing matrix, is a symmetric, positive definite matrix
Cross Validation / Leave One Out Smoothing Matrix
-to speed up cross-validation, Qocv can be computed directly from applying spline, fλ^ fitted to the full dataset:
Qocv(λ) = 1/n Σ [(yj - fλ^(tj))/(1-sjj)]²
-where fλ^ is the full data fitted spline at tj and sjj is the jth diagonal element of Sλ
Generalised Cross-Validation
- a computationally efficient approximation to cross-validation
- replaces sjj with the average of the diagonal elements of Sλ:
Qgcv(λ) = 1/n Σ [(yj - fλ^(tj)) / (1 - 1/n trace(Sλ)]²
-this is the optimal smoothing method used in the mgcv package in R
How mang degrees of freedom are in a smoothing spline? Outline
-there are (n+ν) parameters in (b_.a_) but not all are completely free
How mang degrees of freedom are in a smoothing spline? λ -> ∞
-smoothing spline f^(t) becomes the least squares regression solution for model formula y~1 when ν=1, OR y~1+t when ν=2
How mang degrees of freedom are in a smoothing spline? λ -> 0
-number of degrees of freedom becomes n, since smoothing spline f^(t) becomes the interolating spline when λ=0
Ordinary Least Squares Regression Fitted Values
y^ = X [X^t X]^(-1) X^t y
Ordinary Least Squares Regression Hat Matrix
y^ = H y
-where H, the hat matrix, linearly maps data y onyo fitted values y^:
H = X [X^t X]^(-1) X^t
Ordinary Least Squares Regression Hat Matrix & DoF
-for ordinary least squares regression:
trace(H) = p
-the trace of the hat matrix is equal to the number of model parameters (the number of degrees of freedom
Smoothing Matrix Hat Matrix
- for the smoothing spline, the smoothing matrix takes on the role of the hat matix
- it linearly maps the data onto the fitted values
Smoothing Matrix Effective Degrees of Freedom
edf_λ = trace(Sλ)
-can show that:
edf_∞ = ν edf_0 = n
Penalised Sum of Squares
Rλ(f) = Σ[yi - f(ti)]² + λ J(f)
-sum from i=1 to i=n
When can the penalised sum of squares be used?
-the penalised sum of squares is fine for Gaussian data BUT for non-Gaussian or non-identity link functions this needs to be replaces with the penalised deviance
Penalised Deviance
Definition
Rλ(f,β) = D(y,f,β) + λ J(f)
- where D is the deviance for the vector y of observations modelled by a linear predictor comprising of spline function, f, of order ν (& possible covariate main effects and interactions, β)
- penalised deviance is then minimised with respect to spline coefficients b and a (& regression parameters β, if any)
Penalised Deviance Roughness Penalty
-when there are several smooth terms of order ν in models f1,…,fm each may be assigned its own roughness penalty:
Rλ1,..,λm(y,f1,…,fm,β) = D(y,f1,…,fm,β) + Σ λn J(fn)
- sum form n =1 to n=m
- or the same one can be used for all of them
Penalised Sum of Square Residuals
Rλ(f) = Σ [yi - f(ti)]² + λ J(f)
- where the first term is the sum of square residuals, sum from i=1 to i=n
- and λ≥0 is the smoothness parameter
- and J(f) is the roughness penalty
Which spline minimises the penalised sum of squares?
-f^, the function that minimises Rλ(f), is a natural spline:
f^(t) = Σ bi^ |t-ti|^p + {ao^ if ν=1 OR (ao^+a1^ t) if ν=2}
- where p=(2ν-1)
- and IF ν=1 then Σ bi^ = 0
- or IF ν=2 then Σ bi^ = Σ t*bi^ = 0
Penalised Sum of Squares
λ->0
-f^ is rougher and converges to the interpolating spline
Penalised Sum of Squares
λ->∞
-f^ is smoother, regardless of where the points are, f^ becomes a straight line