Q Flashcards
How do you derivate a0 and a1 for a linear regression model that is least squares optimal?
1.Identify the sum of squared errors equation for the model
2.Take partial derivatives with respect to a0 and a1
3. Set equal to zero, and solve simultaneous equations
What does the R^2 statistic indicate?
If it is close to 1, the model is able to capture variation in the data well
What is meant by regularization?
Regularization is the application of constraints to the amplitude of estimated model parameters in order to simplify the model
What is regularizations relevance to modelling?
it reduces the model variation and helps with predictor selection
also reduced numerical issues when constructing the regressor matrices
What does LASSO stand for?
Least Absolute Shrinkage and Selection Operator
What does Lasso do to the cost function J?
Introduces a term into cost function J which is linear with absolute values of model coefficient, through tuning parameter Lambda
What are the parameters in regularization governed by?
the set of parameters is identified by the point at which contours of J and the additional parameter-dependent contours are tangential
What are the steps for applying Lasso through cross-validation?
1 - Divide n observations into K equal groups
2 - Specify a range of Lasso weights lambda (lambda 1… lambda m)
3 - let k=k+1 and lambda =lambda k, train K different models using each of the data groups and lasso, except 1
4 - Find average performance
5 - if K<M return to 3, otherwise find k* such that CVMSE is minimum
6 - Let lambda=lambda k* and apply lasso to n observations to determine final model parameter estimates.
What is the advantage of Lasso over L2 regularization?
-optimal set of parameters after regularization is tangent point of J cost contour with constraint contours
- The Lasso formulation makes this point null -> induces sparse solutions
- Some predictors can be excluded, reducing model size.
What is the optimal set of parameters after regularization?
the tangent point of J cost contour with constraint contours
What is meant by Odds?
The ratio between the probability of an event occurring with the probability the event won’t occur
What is the definition of logistic model?
The association of posterior probability y of the class membership to a logistic sigmoid function.
What is a support vector machine and how does it work?
-SVM is an extension of the maximal margin classifier
- classifier maximises the margins of support vectors
- SVM expands feature space through non-linear features and returns classification of the original data set.
How does an SVM work?
- works by defining a separation hyperplane in the data space through a small number of data points that are closest to the hyperplane(support vectors), and maximising them
What is the main disadvantage of a support vector classifier?
- Hard classifier - separated classes without returning any info on the level of confidence for each data point.