Regression Flashcards

1
Q

What is the basic model for linear regression?

A

Y = f(X) + ε, where f is a linear function modeling E[Y|X], and ε is a noise term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In Bayesian framework, how are parameters typically estimated?

A

Using the posterior distribution, often with the maximum a posteriori (MAP) estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the ordinary least squares (OLS) estimate?

A

θ̂ = arg min_θ Σ(y_i - f(x_i))^2, which minimizes the squared error between predictions and observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is the OLS solution calculated when X^T X has full rank?

A

θ̂ = (X^T X)^-1 X^T y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is ridge regression and how does it differ from OLS?

A

Ridge regression adds a penalty term: θ̂(λ) = arg min_θ ||Xθ - y||^2 + λ||θ||^2, where λ is the regularization strength.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a kernel function?

A

A function κ(x_i, x_j) = φ(x_i)^T φ(x_j), where φ is a feature map.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the “kernel trick”?

A

The ability to compute κ(x_i, x_j) without explicitly computing φ(x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name three example kernel functions

A

Linear kernel, polynomial kernel, and radial basis function (RBF) kernel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are hyperparameters in kernel regression?

A

Parameters of the kernel function and the regularization strength λ (if ridge kernel regression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a feature map in Kernel regression?

A

A feature map in Kernel regression transforms the input data into a higher-dimensional space to make it easier to find linear relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the basic idea behind random features?

A

The basic idea behind random features in kernel regression is to approximate the kernel function using a finite set of random projections to reduce computational complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the main limitation of kernel methods for large datasets?

A

The kernel matrix grows quadratically with the number of samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is kernel regression suited for the best?

A

High-dimensional data points in moderate datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between parameters and hyperparameters in a model?

A

Parameters control the likelihood function, while hyperparameters parametrize the prior distribution in a Bayesian setting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Bayesian interpretation of the ridge regression penalty?

A

The penalty λ ||θ||^2 can be interpreted as a Gaussian prior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In what case does ridge regression converge to the minimum ℓ2-norm OLS solution?

A

As λ approaches 0.

17
Q

What is polynomial regression?

A

A form of regression where Y = θ_1 + θ_2X + θ_3X^2 + θ_4X^3 + … + ε

18
Q

What is the general form of regression in feature space?

A

Y = φ(X)θ + ε, where φ is a feature map.

19
Q

What is the kernel matrix K?

A

K = φ(X)φ(X)^T, or K_ij = κ(x_i, x_j)

20
Q

How is prediction made in kernel regression?

A

ŷ = Σ_i κ(x_i, x_new)η̂_i, where x_i are training points and η̂ are estimated parameters. It requires the full training set.

21
Q

What is the main advantage of random feature approximation?

A

: It allows kernel methods to be applied to large datasets by reducing computational complexity.

22
Q

In what case should one consider using linear regression in feature space instead of kernel regression?

A

When there’s a small amount of features and the data is sparse in feature space.

23
Q

if p > n, what happens to OLS estimate?

A

OLS estminate has infinitely many solutions, we take the theta with minimal length (minimal L2 norm solution, Lagrange problem)

24
Q

What is Moore-Penrose pseudo inverse?

A

The Moore-Penrose pseudo inverse is a generalization of the matrix inverse that can be applied to non-square or singular matrices to solve linear least squares problems.

25
Q

Benefits of feature maps?

A

Allows linear models to capture non-linear relationships
Can significantly improve model performance on complex datasets
Keeps the simplicity and interpretability of linear models

26
Q

Drawbacks of feature maps

A

Can lead to overfitting if too many features are created
May increase computational complexity
Requires careful selection of appropriate feature maps