The Interpolating Spline
-line passing through all data points -so minimises: Σ [yi - μi^]²
Smoothing Spline Definition
Methods for Chossing the Optimal Lambda
1) training and test
2) cross-validation / ‘leave-one-out’
3) generalised cross-validation
Training and Test
1) partition indices I={1,…,n) into subsets I1 & I2 such that I1⋃I2=S
- this gives a training set: D1 = { (ti,yi), i∈I1} -and test set: D2 = { (ti,yi), i∈I2}
2) fit smoothing spline to D1 to find fλ,I1 for some specified λ
3) calculate the goodness of fit to D2: QI1:I2(λ) = Σ [yi - fλ,I1^(ti)]²
- sum over i in I2 and choose λ to minimise QI1:I2(λ)
Cross Validation / Leave One Out
-same as the training and test set but use only one observation in the test set:
–training set: D1 = D-j = {(ti,yi), i∈I-j}
–test set: D2 = (tj,yj) for given j
-then calculate:
Q-j:j(λ) = [yj - fλ,-j^(tj)]²
-repeat for each j∈{1,…,n} then average to form the ordinary cross-validation criterion:
Qovc(λ) = 1/n Σ [yj - fλ,-j^(tj)]²
Disadvantage of Cross Validation / Leave One Out
Matrix Form of the Smoothing Spline Description
-for given λ and index ν≥1 the fitted value fλ^(tk) at each knot tk may be written as a linear combination of observation y1,…,yn
Matrix Form of the Smoothing Spline Coefficients
-for a smoothing spline which minimises the penalised sum of squares for given λ has coefficients a^ and b^:
[a^ b^]^t = {Mλ}^(-1) [y 0]^t
Matrix Form of the Smoothing Spline f
f = [f1 … fn]^t = K b^ + L a^ = [K L] [Mλ(11) Mλ(21)]^t y
Matrix Form of the Smoothing Spline Smoothing Matrix
-can show:
S = Sλ = [K L] [Mλ(11) Mλ(21)]^t
=>
f = S y
-where S, the smoothing matrix, is a symmetric, positive definite matrix
Cross Validation / Leave One Out Smoothing Matrix
-to speed up cross-validation, Qocv can be computed directly from applying spline, fλ^ fitted to the full dataset:
Qocv(λ) = 1/n Σ [(yj - fλ^(tj))/(1-sjj)]²
-where fλ^ is the full data fitted spline at tj and sjj is the jth diagonal element of Sλ
Generalised Cross-Validation
Qgcv(λ) = 1/n Σ [(yj - fλ^(tj)) / (1 - 1/n trace(Sλ)]²
-this is the optimal smoothing method used in the mgcv package in R
How mang degrees of freedom are in a smoothing spline? Outline
-there are (n+ν) parameters in (b_.a_) but not all are completely free
How mang degrees of freedom are in a smoothing spline? λ -> ∞
-smoothing spline f^(t) becomes the least squares regression solution for model formula y~1 when ν=1, OR y~1+t when ν=2
How mang degrees of freedom are in a smoothing spline? λ -> 0
-number of degrees of freedom becomes n, since smoothing spline f^(t) becomes the interolating spline when λ=0
Ordinary Least Squares Regression Fitted Values
y^ = X [X^t X]^(-1) X^t y
Ordinary Least Squares Regression Hat Matrix
y^ = H y
-where H, the hat matrix, linearly maps data y onyo fitted values y^:
H = X [X^t X]^(-1) X^t
Ordinary Least Squares Regression Hat Matrix & DoF
-for ordinary least squares regression:
trace(H) = p
-the trace of the hat matrix is equal to the number of model parameters (the number of degrees of freedom
Smoothing Matrix Hat Matrix
Smoothing Matrix Effective Degrees of Freedom
edf_λ = trace(Sλ)
-can show that:
edf_∞ = ν edf_0 = n
Penalised Sum of Squares
Rλ(f) = Σ[yi - f(ti)]² + λ J(f)
-sum from i=1 to i=n
When can the penalised sum of squares be used?
-the penalised sum of squares is fine for Gaussian data BUT for non-Gaussian or non-identity link functions this needs to be replaces with the penalised deviance
Penalised Deviance
Definition
Rλ(f,β) = D(y,f,β) + λ J(f)
Penalised Deviance Roughness Penalty
-when there are several smooth terms of order ν in models f1,…,fm each may be assigned its own roughness penalty:
Rλ1,..,λm(y,f1,…,fm,β) = D(y,f1,…,fm,β) + Σ λn J(fn)