Regression and Gradient Descent Flashcards

Question 1

Q

What is an error surface and its relation to a model’s best fit

Answer

A

It is a surface comprised of sum of squared error values (SSE) for every possible combination of weights.
The best fit model is the model corresponding to the lowest point on the error surface

Question 2

Q

What is the learning rate and does it have an optimal value?

Answer

A

It determines the size of the adjustment made to each weight at each step in gradient descent, there is not an optimal value.

Question 3

Q

How to determine the importance of each descriptive feature in a linear regression model? Give an example

Answer

A

a statistical significance test. E.g. t-test

Question 4

Q

What is the null hypothesis of the t-test

Answer

A

the feature does not have a significant impact on the model

Question 5

Q

If a decision boundary is discontinuous, can we calculate the gradient of the error surface? Justify.

Answer

A

No, as it is not differentiable, so it is impossible to calculate the gradient

Question 6

Q

What must we do before training a logistic regression model

Answer

A

Map the binary target feature levels to 0 or 1

Question 7

Q

For logistic regression models what is recommended for the descriptive features during preprocessing

Answer

A

That they are normalized

Question 8

Q

Explain the differences between Batch Gradient Descent and Stochastic Gradient Descent

Answer

A

in BGD, the model parameters are updated in one go, based on the average gradient of the entire training dataset. In SGD, updates occur for each training example or mini-batch.

Question 9

Q

Which gradient descent is preferred for large datasets and why.

Answer

A

SGD is preferred over BGD.

Although BGD usually converges to a more accurate minimum, it is computationally expensive (extremely)
SGD converges faster and requires less memory. However, updates can be noisy, and it may converge to a local minimum rather than the global minimum

Question 10

Q

What does R² represent in linear regression, and give the formula

Answer

A

Also known as the coefficient of determination, it represents the proportion of the variance in the dependent variable that is explained by the independent variables in the model
R² = 1 - SSE/SST

Question 11

Q

What is gradient descent

Answer

A

An algorithm that makes small steps along a function to find a local minimum

Question 12

Q

What do the weights used by linear regression models indicate

Answer

A

the effect of each descriptive feature on the predictions returned by the model. Both the sign and the magnitude of the weight provide information.

Question 13

Q

How to do conduct a t-test for a feature in a linear regression model

Answer

A

State the null hypothesis
Get the standard error of the overall model and the descriptive feature
Compute the t-statistic and look-up associated p-value
If the p-value is less than the significance level, we reject the null hypothesis