Regression and Gradient Descent Flashcards
What is an error surface and its relation to a model’s best fit
It is a surface comprised of sum of squared error values (SSE) for every possible combination of weights.
The best fit model is the model corresponding to the lowest point on the error surface
What is the learning rate and does it have an optimal value?
It determines the size of the adjustment made to each weight at each step in gradient descent, there is not an optimal value.
How to determine the importance of each descriptive feature in a linear regression model? Give an example
a statistical significance test. E.g. t-test
What is the null hypothesis of the t-test
the feature does not have a significant impact on the model
If a decision boundary is discontinuous, can we calculate the gradient of the error surface? Justify.
No, as it is not differentiable, so it is impossible to calculate the gradient
What must we do before training a logistic regression model
Map the binary target feature levels to 0 or 1
For logistic regression models what is recommended for the descriptive features during preprocessing
That they are normalized
Explain the differences between Batch Gradient Descent and Stochastic Gradient Descent
in BGD, the model parameters are updated in one go, based on the average gradient of the entire training dataset. In SGD, updates occur for each training example or mini-batch.
Which gradient descent is preferred for large datasets and why.
SGD is preferred over BGD.
- Although BGD usually converges to a more accurate minimum, it is computationally expensive (extremely)
- SGD converges faster and requires less memory. However, updates can be noisy, and it may converge to a local minimum rather than the global minimum
What does R2 represent in linear regression, and give the formula
Also known as the coefficient of determination, it represents the proportion of the variance in the dependent variable that is explained by the independent variables in the model
R2 = 1 - SSE/SST
What is gradient descent
An algorithm that makes small steps along a function to find a local minimum
What do the weights used by linear regression models indicate
the effect of each descriptive feature on the predictions returned by the model. Both the sign and the magnitude of the weight provide information.
How to do conduct a t-test for a feature in a linear regression model
- State the null hypothesis
- Get the standard error of the overall model and the descriptive feature
- Compute the t-statistic and look-up associated p-value
- If the p-value is less than the significance level, we reject the null hypothesis