Smoothing Parameter Flashcards
The Interpolating Spline
-line passing through all data points -so minimises: Σ [yi - μi^]²
Smoothing Spline Definition
- given data: D = { (ti,yi), i=1,…,n} -with model: yi = f(ti) + εi , εi~N(0,σ²)
- where f(t) is smooth -given knot positions ti, i=1,…,n we can estimate f with smoothing spline fλ^(t) calculated using the matrix solution to the smoothing spline
Methods for Chossing the Optimal Lambda
1) training and test
2) cross-validation / ‘leave-one-out’
3) generalised cross-validation
Training and Test
1) partition indices I={1,…,n) into subsets I1 & I2 such that I1⋃I2=S
- this gives a training set: D1 = { (ti,yi), i∈I1} -and test set: D2 = { (ti,yi), i∈I2}
2) fit smoothing spline to D1 to find fλ,I1 for some specified λ
3) calculate the goodness of fit to D2: QI1:I2(λ) = Σ [yi - fλ,I1^(ti)]²
- sum over i in I2 and choose λ to minimise QI1:I2(λ)
Cross Validation / Leave One Out
-same as the training and test set but use only one observation in the test set:
–training set: D1 = D-j = {(ti,yi), i∈I-j}
–test set: D2 = (tj,yj) for given j
-then calculate:
Q-j:j(λ) = [yj - fλ,-j^(tj)]²
-repeat for each j∈{1,…,n} then average to form the ordinary cross-validation criterion:
Qovc(λ) = 1/n Σ [yj - fλ,-j^(tj)]²
Disadvantage of Cross Validation / Leave One Out
- very computationally intensive
- can write in terms of the smooting matrix instead
Matrix Form of the Smoothing Spline Description
-for given λ and index ν≥1 the fitted value fλ^(tk) at each knot tk may be written as a linear combination of observation y1,…,yn
Matrix Form of the Smoothing Spline Coefficients
-for a smoothing spline which minimises the penalised sum of squares for given λ has coefficients a^ and b^:
[a^ b^]^t = {Mλ}^(-1) [y 0]^t
Matrix Form of the Smoothing Spline f
f = [f1 … fn]^t = K b^ + L a^ = [K L] [Mλ(11) Mλ(21)]^t y
Matrix Form of the Smoothing Spline Smoothing Matrix
-can show:
S = Sλ = [K L] [Mλ(11) Mλ(21)]^t
=>
f = S y
-where S, the smoothing matrix, is a symmetric, positive definite matrix
Cross Validation / Leave One Out Smoothing Matrix
-to speed up cross-validation, Qocv can be computed directly from applying spline, fλ^ fitted to the full dataset:
Qocv(λ) = 1/n Σ [(yj - fλ^(tj))/(1-sjj)]²
-where fλ^ is the full data fitted spline at tj and sjj is the jth diagonal element of Sλ
Generalised Cross-Validation
- a computationally efficient approximation to cross-validation
- replaces sjj with the average of the diagonal elements of Sλ:
Qgcv(λ) = 1/n Σ [(yj - fλ^(tj)) / (1 - 1/n trace(Sλ)]²
-this is the optimal smoothing method used in the mgcv package in R
How mang degrees of freedom are in a smoothing spline? Outline
-there are (n+ν) parameters in (b_.a_) but not all are completely free
How mang degrees of freedom are in a smoothing spline? λ -> ∞
-smoothing spline f^(t) becomes the least squares regression solution for model formula y~1 when ν=1, OR y~1+t when ν=2
How mang degrees of freedom are in a smoothing spline? λ -> 0
-number of degrees of freedom becomes n, since smoothing spline f^(t) becomes the interolating spline when λ=0