Week 2 Flashcards
How do we minimise the error on a regression?
Minimise the MSE by optimising B0 and B1, the y intercept and the slope.
What is an overdetermined linear system
More equations than unknowns.
Describe the Linear Least Squares method
Create a function of the sum of all the squared errors for each of the x and y values.
Perform partial derivatives for each variable, then set them to 0 and solve the simultaneous equations.
Explained variation vs unexplained variation
Explained variation is difference between your graph (x,y) and the mean of Y at x.
Unexplained is the difference between your graph (x,y) and the true value of Y at X.
Total variation is the sum
We can use our models for prediction. What else can we use them for?
Interpretation to gain insights from the data.
e.g.
Gather x,y, train model by finding lambda that gives best prediction
Focus on lambda rather than yp to generate insights
(understand what makes cars safer for example)
How do we enhance the linear regression model?
Use polynomials to fit the curvature of the data (they help to find variables that explain variations in data better)
Why is Polynomial still linear?
As the outcome is a linear combination of features
The non-linear relationship between one feature and another doesn’t make the algorithm non-linear, it is the algorithm itself that is still a linear combination of features.
How do we choose the order of polynomial models?
Using Bayes Information Criterion
BIC = nln(SSe) - nln(n) + ln(n)p
p is number of parameters
n is number of observations
ln is the natural log
SSe is sum of squared error.
Minimise it to balance error and generalisability
What is an interaction term
Combining variables if both happen at the same time (like smoking and having lung cancer)
How do we sample appropriately when doing cross validation?
Not random sampling, stratified.
Whats stratified sampling
Stratified sampling is a sampling technique where the samples are selected in the same proportion as
they appear in the population.
For example, if the population of interest has 60% CEOs and 40% CTOs, then we divide the
population into 2 groups and choose 60% from the CEO group and 40% from the CTO group.