Stats Year 2 - Man Flashcards
SSR
The sum of the square if the distance away from any point to y bar
SSR + SSE
The total variance on the y axis
Regression line will always pass through (…) so if you know that and the gradient, you know the line
x bar, y bar
SSE
Sum of squares of errors
sum ((actual y - predicted y)squared)
Linear regression measures
Causal relationship between two variables
Regression analysis fits … line to a … plot
straight, scatter
Mulitple linear regression is superior to classical linear progression because:
- It allows us to perform all the calculations in one go
- It determines the contribution of each independednt variable
- The result is a model that can predict the DV using two or more IVs
- The model is usually good or adequate without all IVs
Limitation of multiple linear regression
Process is iterative, so each step depends on the previous
How to run multiple linear regression?
- Make an initial model and run with lm
- Remove the variable with the highest p value
- Repeat until all variables have a p value less than 0.05
ANOVA test of successive models in MLR
Tells us whether there is a significant difference in the performance of each model.
Value within range is good
Value outside of range means that they are significantly different
AIC
An assessment of how well the model is doing given the number of DVs.
Want to minimise the AIC
Model will stop running when removing a variable will increase the AIC
AIC equation
AIC = 2k - 2 ln(L)
where k is the number of parameters in the model and L is the maximum likelihood of …
Using AIC to compare models
AIC < 2 means models are similarly good
AIC 4-7 means one model is probably better
AIC > 10 means there is stron g evidence that the lower model is better.
Principle Component Analysis
A tool for exploring the structure of multivariate data.
PCA as a Data Reduction Technique
Allows us to reduce the number of variables to a manageable number of new variables or components without sacrifising too much information