Lab 1 Flashcards

Notes that can be extracted from the lab material

1
Q

Why do we add a column of 1s to the X matrix when computing the Least Squared Estimation?

A

Adding a column of 1s to the design matrix X:
- Incorporates the bias term into the matrix formulation
- Allows the model to learn a non-zero intercept
- Simplifies the matrix operations by combining all parameters (bias and weights) into a single vector (w’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the positives of using Least Squared Estimation?

A

Simplicity - Straightforward to understand and implement

Optimality under Certain Conditions - Provides the best unbiased linear estimator, so long as the data satisfies the assumptions of linear regression e.g. linear relationship, Gaussian errors, no multicollinearity

Efficiency - Computationally efficient for small to moderately sized problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the negatives of Least Squared Estimation?

A

Sensitive to Outliers - Squaring the residuals amplifies the effect of large errors

Poor Performance in higher dimensions - In high dimensions, X^TX can become nearly singular, leading to numerical instability or overfitting

No Feature Selection - Uses all available features without prioritising or penalising irrelevant ones, which can degrade performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the positives of Gradient Descent?

A

Scalability for large datasets - Can handle very large datasets efficiently, as it doesn’t require loading entire datasets into memory.

Flexibility - Works with a wide variety of loss functions and models, including linear regression, logistic regression and deep learning.

Easy to implement - Relatively simple to code and integrate into most ML pipelines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the negatives of Gradient Descent?

A

Sensitive to Feature Scaling - Features with vastly different scales can cause Gradient Descent to converge slowly or behave unpredictably. Feature normalisation is often required

Can get stuck in Local Minima - In non-convex functions, gradient descent can converge to local minima or saddle points

Lack of interpretability - It optimises the loss function, but doesn’t inherently provide insight into the importance of features or the quality of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly