17 Linear Regression Flashcards
Limitations of correlation coefficient
Doesn’t help us make predictions, it is only calculated for two variables
What does Ei represent?
The error term
Assumption of simple linear regression model
Xi are fixed (non random)
Ei and Yi are random variables
Residuals of regression
Êi= yi- ŷi
Measure the vertical distance between the fitted line ŷi and the actual values of yi
What is the intercept
B0
What is the slope
B1
OLS
Ordinary Least Squares. It works by fitting a line through the data minimising the sum of squared residuals
What is the estimate of B1 equal to?
Cov(x,y)/var(x)
What is the estimate of B0 equal to?
The mean of y - cov(x,y)/var(x) x mean of x
Is the OLS estimator biased?
No because the expected value of the estimates of B0 and B1 is B0 and B1
Which estimators are blue and have the smallest variance?
OLS estimators
Blue = best linear unbiased estimator
What is the variance of the estimators of B0 and B1?
Zero, this shows they are consistent estimators
What is coefficient of determination denoted as?
R^2
What is the coefficient of determination
It calculates the proportion of the variation in the dependent variable that is explained by the fitted regression
Total sum of squares (TSS)
The total squared variation of the yi values about their mean
TSS= sum of (yi-mean)^2
Explained sum of squares ESS
The total squared variation of the fitted values ŷi about their mean
ESS= sum of (ŷi-mean)^2
Residual sum of squares RSS
The total squared difference between the yi values and the fitted ŷi values
RSS= sum of (yi-ŷi)^2
What is the TSS made up of?
TSS=ESS+RSS
How can we work out the coefficient of determination?
R^2=ESS/TSS
R^2=1-RSS/TSS
What values can R^2 take?
0<=R^2<=1
What does it mean if ESS and R^2 are large?
The model is a good fit
When is degrees fo freedom n-2?
When we are estimating two parameters
As sample size decreases, what happens to the standard error and test statistic?
Standard error increases
Test statistic decreases
When should we take inference from a hypothesis test?
When n>25, otherwise it is very hard to reject the null hypothesis and the power of the test is low
When is the multiple linear regression used?
When one explanatory variable is insufficient to explain the variation of the dependent variable
What will be the degrees of freedom when dealing with k+1 different parameters
n-1-k