Splines Flashcards
Spline function definition
[a,b] -> R with m knots (m-2 internal) of degree l if:
- (l-1) order continuously differentiable;
- Within each m-1 interval, different polynomial of degree l.
Number of free parameters in splines
d = l + m - 1
Truncated power basis
Bj(z) =
- zj j=1,…l+1
- (z-kj-1)l+ j=l+2,…d
B-splines definition
Defined recursively
- Values > 0 only between a pair of knots
- Σ Bj(z) = 1 ∀ z
- ∀ z ∈ [a,b] only (l+1) bases are > 0.
MLE for B-splines
β^ = (Z’Z)-1Z’y
Model selection for splines
LRT only if using truncated power basis for add/removal of knots.
All other cases (B-splines, movement of knots), model selection criteria.
P-splines definition
- We start with a high number of m-2 internal knots;
- We remove some using the penalized log-likelihood pl(β,σ2) = l(β,σ2) - λ/2 Ji(β) = -n/2 ln(σ2) - n/(2σ2) (y-Zβ)’(y-Zβ) - λ/2 β‘Kiβ.
If λ=0 we have B-splines with no penalization.
If λ -> + &infty; we tend to have only constant functions (very smooth ones).
Order differences
For B-splines, we can use:
J1(β) = Σj=2d (βj - βj-1)2 = βK1β’
J2(β) = Σj=3d (βj - 2βj-1 + βj-2)2 = βK2β’
Ki, dXd matrix made of differential matrices D(d)1, (d-1)Xd and D(d-1)1, (d-2)Xd
MLE for P-splines
β^ = (Z’Z - ΛKi)-1Z’y
- Λ = λσ2
Equivalent to minimizing the penalized least squares: PLS = (y-Zβ)’(y-Zβ) - Λβ‘Kiβ
Fitted values P-splines
f^ = y - Zβ^ = y - Z(Z’Z-ΛK)-1Z’y = [I - Z(Z’Z-ΛK)-1Z’]y = [I - S]y
S: Smoother matrix
Effective number of parameters P-splines
df(S) = trace(S)
{ edf in output }
Estimator for σ2 P-splines
[Σ(yi - f^(zi))2]/[n - df(S)]
If ML (for AIC and BIC): 1/n * Σ(yi - f^(zi))2]
Choice of best Λ
- CV = 1/n Σ [(yi - f^(zi))/(1 - Sii)]2
- GCV = 1/n Σ [(yi - f^(zi))/(1 - df(S)/n)]2
- AIC or BIC, remembering σ2ML^ and number of parameters = df(S)+1
Inference basis for P-splines
Y|Z ∼ Nn (f, σ2In) that leads to:
f^|Z ∼ Nn (Sf, σ2SS)
- P-splines are biased estimators;
- Produced p-values for hypothesis testing are anticonservative: they underestimate the true values.
B-splines in R
library(splines)
internal = m-2
location = quantile(z, probs=(1:internal)/(internal+1))
bsp = bs(z, knots=location, degree=l, intercept=F)
#Equidistant knots: bsp = bs(z, df=d, degree=l)
fit = lm(y~bsp) #Or -1 if intercept=T
P-splines in R
library(mgcv)
smoothers = s(z, bs=”ps”, k=m, m=c((l-1), i), sp=lambda)
fit = gam(y~smoothers)
#edf = df(S) effective number of parameters
#edf+1 total coefficients