Multiple regression Flashcards
what are the assumptions of regression?
- residuals normally distributed (mean of 0 and SD of sigma squared)
- homoskedasticity (Variance of residual remains constant no matter the value of x
- Residuals not correlated
how exactly do we get the regression line? what method is used
Line is fit using the method of least squares.where the square of the residuals is minimised.
what are we estimating with regression?
the intercept and slope of the population parameter
why center a variable?
to allow for more meaningful interpretation. E.g, imagine centering age to 46, the mean age of a sample. Then the intercept tells us what the DV id for a 46 year old with all other predictors set to 0.
if you standardise a variable e.g., age, how does this impact the interpretation of the regression line
what now does the slope reflect?
After standardising a variable, the slope is interereted as a single SD change, how this would affect the outcome.
if i standardised both the predictor and outcome variable. what does the coeefficient for the X variable reflect?
pearsons correlation coefficient
in simple regression what part of the equation captures the explained and unexplained variance?
systemic part : B0 + B1
Random part: residual (e)
the
describe the form of a regression equation
Response = Systemic part + random part
in simple regression how do we capture the residual variability?
sigma squared
to estimate the residual variability using sigma squared - what assumption needs to be made
that the residuals are normally distributed
What is explained variance in Y, unexplained variance in Y and the total variance
- Explained variance = the variance in Y that is explained by X
- Unexplained variance = residual
- Total variance = sum of explained and unexplained variance.
what is R-squared? What is the formula to calculate this?
- The amount of variance in Y explained by X
- Formula: explained variance / total variance = R-squared
in simple regression what is R squared the same thing as?
The square of the Pearson’s correlation coefficient
what two things affect the standard error
> sampel size
> amount of variability in X and amount of variance in Y unexplained by X (residual variance)
specifically, SE DECREASES with more variability in X, SE INCREASES with more residual variance.
How use the SE to calculate the 95% confidence interval for the slope estimate (B1)
lets say SE is 0.001
CI = slope +/- (1.96 x 0.001)
how do we use the SE to calculate the test statistic (aka the Z or t-ratio)
- SE of sex is 0.025
- coefficient is -0.156
slope / SE = test statistic
so Z ratio is -0.156 / 0.025 = -6.12
we have 2 groups treatment and control. We want to test whether there is a difference int the variance of these groups.
I could use ANOVA or regression. Which is better?
Regression as it allows you to control for the effects of other predictors
multiple regression with categorical predictor. measuring score of hunger in different countries
Germany, UK , france,
UK is reference coutnry.
HUNGER = BO + B1 + B2 + E
what exactly do the B1 and B2 slope reflect
- B1 = the difference in means Germany vs UK
- B2 = difference in means France vs UK
how can we express the null hypothesis for the difference in hedonism scores for Germany vs UK. Then again for France vs UK?
- H0: B1 = 0
- HO: B2 = 0