Regression and Gradient Descent Flashcards

1
Q

What is an error surface and its relation to a model’s best fit

A

It is a surface comprised of sum of squared error values (SSE) for every possible combination of weights.
The best fit model is the model corresponding to the lowest point on the error surface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the learning rate and does it have an optimal value?

A

It determines the size of the adjustment made to each weight at each step in gradient descent, there is not an optimal value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to determine the importance of each descriptive feature in a linear regression model? Give an example

A

a statistical significance test. E.g. t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the null hypothesis of the t-test

A

the feature does not have a significant impact on the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If a decision boundary is discontinuous, can we calculate the gradient of the error surface? Justify.

A

No, as it is not differentiable, so it is impossible to calculate the gradient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What must we do before training a logistic regression model

A

Map the binary target feature levels to 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For logistic regression models what is recommended for the descriptive features during preprocessing

A

That they are normalized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the differences between Batch Gradient Descent and Stochastic Gradient Descent

A

in BGD, the model parameters are updated in one go, based on the average gradient of the entire training dataset. In SGD, updates occur for each training example or mini-batch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which gradient descent is preferred for large datasets and why.

A

SGD is preferred over BGD.

  • Although BGD usually converges to a more accurate minimum, it is computationally expensive (extremely)
  • SGD converges faster and requires less memory. However, updates can be noisy, and it may converge to a local minimum rather than the global minimum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does R2 represent in linear regression, and give the formula

A

Also known as the coefficient of determination, it represents the proportion of the variance in the dependent variable that is explained by the independent variables in the model
R2 = 1 - SSE/SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is gradient descent

A

An algorithm that makes small steps along a function to find a local minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do the weights used by linear regression models indicate

A

the effect of each descriptive feature on the predictions returned by the model. Both the sign and the magnitude of the weight provide information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to do conduct a t-test for a feature in a linear regression model

A
  1. State the null hypothesis
  2. Get the standard error of the overall model and the descriptive feature
  3. Compute the t-statistic and look-up associated p-value
  4. If the p-value is less than the significance level, we reject the null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly