Q Flashcards
How do you derivate a0 and a1 for a linear regression model that is least squares optimal?
1.Identify the sum of squared errors equation for the model
2.Take partial derivatives with respect to a0 and a1
3. Set equal to zero, and solve simultaneous equations
What does the R^2 statistic indicate?
If it is close to 1, the model is able to capture variation in the data well
What is meant by regularization?
Regularization is the application of constraints to the amplitude of estimated model parameters in order to simplify the model
What is regularizations relevance to modelling?
it reduces the model variation and helps with predictor selection
also reduced numerical issues when constructing the regressor matrices
What does LASSO stand for?
Least Absolute Shrinkage and Selection Operator
What does Lasso do to the cost function J?
Introduces a term into cost function J which is linear with absolute values of model coefficient, through tuning parameter Lambda
What are the parameters in regularization governed by?
the set of parameters is identified by the point at which contours of J and the additional parameter-dependent contours are tangential
What are the steps for applying Lasso through cross-validation?
1 - Divide n observations into K equal groups
2 - Specify a range of Lasso weights lambda (lambda 1… lambda m)
3 - let k=k+1 and lambda =lambda k, train K different models using each of the data groups and lasso, except 1
4 - Find average performance
5 - if K<M return to 3, otherwise find k* such that CVMSE is minimum
6 - Let lambda=lambda k* and apply lasso to n observations to determine final model parameter estimates.
What is the advantage of Lasso over L2 regularization?
-optimal set of parameters after regularization is tangent point of J cost contour with constraint contours
- The Lasso formulation makes this point null -> induces sparse solutions
- Some predictors can be excluded, reducing model size.
What is the optimal set of parameters after regularization?
the tangent point of J cost contour with constraint contours
What is meant by Odds?
The ratio between the probability of an event occurring with the probability the event won’t occur
What is the definition of logistic model?
The association of posterior probability y of the class membership to a logistic sigmoid function.
What is a support vector machine and how does it work?
-SVM is an extension of the maximal margin classifier
- classifier maximises the margins of support vectors
- SVM expands feature space through non-linear features and returns classification of the original data set.
How does an SVM work?
- works by defining a separation hyperplane in the data space through a small number of data points that are closest to the hyperplane(support vectors), and maximising them
What is the main disadvantage of a support vector classifier?
- Hard classifier - separated classes without returning any info on the level of confidence for each data point.
What is meant by ARX?
Auto Regressive model with Exogeneous inputs
What are ARX and ARMAX model?
linear models used in the analysis of time series data modelling of dynamical systems through a sample of its present and past inputs/outputs.
What is mean by ARMAX?
Auto Regressive model with Moving Average Exogeneous inputs.
What is a confusion matrix?
table to compare the rate of the true and predicted values of a classifier model
What is mean by ROC and how is it constructed?
- Receiver Operating Characteristic
- A diagram formed by plotting the sensitivity against 1-specificity, when the classifier threshold value changes from T=0 to T=1.
- each point on the curve defines the value of TPR and FPR obtained for a certain threshold of the classifier
-can assess accuracy over all thresholds
What is the AUC and what is it used for?
Area under Curve
quantifies performance of a classifier
perfect classifier has AUC of 1
What elements of the data should be considered before building a black box model?
- Model will represent the data using basis functions
- Provides the best representation, but the data itself must be of good quality
- Density of the data shapes the relative weights of the basis function components
- Low density areas will be poorly represented and overfit
How does cross validation work simply?
Rotate training and testing samples through the data
test on one part of the data and train on the rest
What are the two most important factors to employ to prevent bias in AI models?
Pre-processing of data
Explainability of method
What does PCA application do to a data set?
PCA projects and reconstructs feature set into a frame set that maximises variability in each axis
What is the purpose of PCA?
Provides a measure of importance of each input and allows the reduction of the dimensionality of a large feature space
What is the geometrical relationship between principal components?
Each PC is orthogonal to the next
What do the eigenvectors of a data set represent?
Principle components
What do the eigenvalues of a data set represent?
Variance contained in that eigenvector direction
How do you identify the first principle component of a data set?
The largest eigenvalue points to the eigenvector containing the first PC
How would you incorporate non-linearity into the decision boundary of a two-input logistic classifier?
Rewrite the basis function and change the number of weights required.
what is meant by inner product?
dot product
How do you derive the least squares solution?
(z - Xa)T (z - Xa) and expand this like a standard equation
gradient with respect to m-vector
nabla (ZTZ - 2aT XT z + aT XT Xa)
= 0 - 2 XT a + 2 XT Xa
equate to zero and rearrange
a = (XTX)^-1 XTz
You introduce quadratic non-linearity to a model, what are the new column dimensions of the design matrix?
(d+n)! / d! n!, where n=2 and d=3