Regression Flashcards by Oussama Bouanani

What is the basic model for linear regression?

Y = f(X) + ε, where f is a linear function modeling E[Y|X], and ε is a noise term.

How well did you know this?

Not at all

Perfectly

In Bayesian framework, how are parameters typically estimated?

Using the posterior distribution, often with the maximum a posteriori (MAP) estimate.

How well did you know this?

Not at all

Perfectly

What is the ordinary least squares (OLS) estimate?

θ̂ = arg min_θ Σ(y_i - f(x_i))^2, which minimizes the squared error between predictions and observations.

How well did you know this?

Not at all

Perfectly

How is the OLS solution calculated when X^T X has full rank?

θ̂ = (X^T X)^-1 X^T y

How well did you know this?

Not at all

Perfectly

What is ridge regression and how does it differ from OLS?

Ridge regression adds a penalty term: θ̂(λ) = arg min_θ ||Xθ - y||^2 + λ||θ||^2, where λ is the regularization strength.

How well did you know this?

Not at all

Perfectly

What is a kernel function?

A function κ(x_i, x_j) = φ(x_i)^T φ(x_j), where φ is a feature map.

How well did you know this?

Not at all

Perfectly

What is the “kernel trick”?

The ability to compute κ(x_i, x_j) without explicitly computing φ(x).

How well did you know this?

Not at all

Perfectly

Name three example kernel functions

Linear kernel, polynomial kernel, and radial basis function (RBF) kernel.

How well did you know this?

Not at all

Perfectly

What are hyperparameters in kernel regression?

Parameters of the kernel function and the regularization strength λ (if ridge kernel regression)

How well did you know this?

Not at all

Perfectly

What is a feature map in Kernel regression?

A feature map in Kernel regression transforms the input data into a higher-dimensional space to make it easier to find linear relationships.

How well did you know this?

Not at all

Perfectly

What is the basic idea behind random features?

The basic idea behind random features in kernel regression is to approximate the kernel function using a finite set of random projections to reduce computational complexity.

How well did you know this?

Not at all

Perfectly

What is the main limitation of kernel methods for large datasets?

The kernel matrix grows quadratically with the number of samples.

How well did you know this?

Not at all

Perfectly

What is kernel regression suited for the best?

High-dimensional data points in moderate datasets.

How well did you know this?

Not at all

Perfectly

What is the difference between parameters and hyperparameters in a model?

Parameters control the likelihood function, while hyperparameters parametrize the prior distribution in a Bayesian setting.

How well did you know this?

Not at all

Perfectly

What is the Bayesian interpretation of the ridge regression penalty?

The penalty λ ||θ||^2 can be interpreted as a Gaussian prior.

How well did you know this?

Not at all

Perfectly

In what case does ridge regression converge to the minimum ℓ2-norm OLS solution?

Study These Flashcards

As λ approaches 0.

What is polynomial regression?

Study These Flashcards

A form of regression where Y = θ_1 + θ_2X + θ_3X^2 + θ_4X^3 + … + ε

What is the general form of regression in feature space?

Study These Flashcards

Y = φ(X)θ + ε, where φ is a feature map.

What is the kernel matrix K?

Study These Flashcards

K = φ(X)φ(X)^T, or K_ij = κ(x_i, x_j)

How is prediction made in kernel regression?

Study These Flashcards

ŷ = Σ_i κ(x_i, x_new)η̂_i, where x_i are training points and η̂ are estimated parameters. It requires the full training set.

What is the main advantage of random feature approximation?

Study These Flashcards

: It allows kernel methods to be applied to large datasets by reducing computational complexity.

In what case should one consider using linear regression in feature space instead of kernel regression?

Study These Flashcards

When there’s a small amount of features and the data is sparse in feature space.

if p > n, what happens to OLS estimate?

Study These Flashcards

OLS estminate has infinitely many solutions, we take the theta with minimal length (minimal L2 norm solution, Lagrange problem)

What is Moore-Penrose pseudo inverse?

Study These Flashcards

The Moore-Penrose pseudo inverse is a generalization of the matrix inverse that can be applied to non-square or singular matrices to solve linear least squares problems.

Benefits of feature maps?

Allows linear models to capture non-linear relationships Can significantly improve model performance on complex datasets Keeps the simplicity and interpretability of linear models

Drawbacks of feature maps

Can lead to overfitting if too many features are created May increase computational complexity Requires careful selection of appropriate feature maps

Regression Flashcards

(26 cards)