Regression Flashcards
What is the basic model for linear regression?
Y = f(X) + ε, where f is a linear function modeling E[Y|X], and ε is a noise term.
In Bayesian framework, how are parameters typically estimated?
Using the posterior distribution, often with the maximum a posteriori (MAP) estimate.
What is the ordinary least squares (OLS) estimate?
θ̂ = arg min_θ Σ(y_i - f(x_i))^2, which minimizes the squared error between predictions and observations.
How is the OLS solution calculated when X^T X has full rank?
θ̂ = (X^T X)^-1 X^T y
What is ridge regression and how does it differ from OLS?
Ridge regression adds a penalty term: θ̂(λ) = arg min_θ ||Xθ - y||^2 + λ||θ||^2, where λ is the regularization strength.
What is a kernel function?
A function κ(x_i, x_j) = φ(x_i)^T φ(x_j), where φ is a feature map.
What is the “kernel trick”?
The ability to compute κ(x_i, x_j) without explicitly computing φ(x).
Name three example kernel functions
Linear kernel, polynomial kernel, and radial basis function (RBF) kernel.
What are hyperparameters in kernel regression?
Parameters of the kernel function and the regularization strength λ (if ridge kernel regression)
What is a feature map in Kernel regression?
A feature map in Kernel regression transforms the input data into a higher-dimensional space to make it easier to find linear relationships.
What is the basic idea behind random features?
The basic idea behind random features in kernel regression is to approximate the kernel function using a finite set of random projections to reduce computational complexity.
What is the main limitation of kernel methods for large datasets?
The kernel matrix grows quadratically with the number of samples.
What is kernel regression suited for the best?
High-dimensional data points in moderate datasets.
What is the difference between parameters and hyperparameters in a model?
Parameters control the likelihood function, while hyperparameters parametrize the prior distribution in a Bayesian setting.
What is the Bayesian interpretation of the ridge regression penalty?
The penalty λ ||θ||^2 can be interpreted as a Gaussian prior.