Chapter 7 - Non Linear Regression Flashcards
Strategy for Non-linear regression
define basis functions: Y = b0 + b1*f1(X) + …
fit this model through least squares regression
options for f include polynomial functions, indicator functions (results in a step fcn), piecewise polynomials, local regression
Cubic Splines
1) define a set of knots
2) we want the function f(X) to be
- cubic polynomial between each set of knots
- be continuous at each knot
- have continuous first and second derivatives at each knot
result: f can be written as K + 3 basis functions
Y = B0 + B1X + B2X^2 + B3X^3 + B4h(X,knot1) + …. + B_K+3*h(X,knot_K)
where h(X,knot) is a piecewise fcn that is (X - knot)^3 if x>0, otherwise 0
Natural cubic spline
linear for start and end of cubic spline
Choosing number and location of knots
location of knots is usually quantiles of X (K < n). find the spline which minimizes RSS
number of knots K is chosen by CV
Natural cubic splines v polynomial regression
- splines can fit complex functions with a few parameters
- polynomials require high degree terms to be flexible, but high degree polynomials can be unstable at edges
Smoothing splines - definition and 2 facts.
find the function which minimizes the fcn:
sum[(y_i - f(x))^2] + lambda integral f’‘(x)^2 dx
which is the RSS of the model plus a penalty for roughness. we find lambda by CV
turns out that the minimizer of this smoothing fcn is a natural cubic spline with knots at the training points.
NOTE: don’t find a cubic spline that minimizes to RSS = 0 (this would be overfitting). obtaining this fcn is similar to ridge regression
deriving a smoothing spline
- write it out -
Choosing the regularization parameter in smoothing splines
choose lambda by CV, we can solve for all lambda by diagonalizing a matrix. There is a shortcut for LOOCV
Local Linear Regression
Span is the fraction of training samples used in each regression (span is chosen by CV)
to predict the regression fcn f at an input x:
1) assign a weight K to the training point x_i such that K = 0 if x_i is not one of the k nearest neighbors of x, and K decreases as distance between x_i and x increases
2) perform a weighted least squares regression; i.e., find (Beta0, Beta1) which minimize sum[K(y - Beta0 - Beta1X)]
3) predict fhat(x) = Beta0 + Beta1(x)
GAMs - algorithm
extension of non-linear models to multiple predictors.
If functions f have a basis representation, we can use least squares to fit.
Otherwise we can use backfitting:
1) keep f2 .. .fp fixed and fit f1 using partial residuals as the response
2) keep f1, f3, … fp fixed and fit f2 using partial residuals as the response
3) …
4) iterate
This works for smoothing splines and local regression
GAMs - properties (4)
1) GAMs are a step from linear regression toward a fully non-parametric method
2) The only constraint is additivity. This can be partially addressed by adding key interaction variables XiXj
3) We can report the degrees of freedom for most non-linear functions
4) As in linear regression, we can examine the significance of each predictor