Midterm 3 Flashcards
In OLS regression, total variation or deviation follows the logic of what test?
-test of significance called analysis of variance (ANOVA)
Total variation in bivariate regression represents what?
-the total sum of squares SST
What indicates the explained variation in a bivariate regression?
- SSR sum of squares regression
- the amount of variation in Y accounted for by X
- amount of total variation that is explained by regression equation
What is the SSR also called?
-model sum of squares SSM
What represents the amount of variance left over in Y that the bivariate regression didn’t account for?
- sum of squared errors (SSE)
- Sometimes called residual sum of squares (SSR)
What is the most important use of SST, SSE and SSR?
- calculation of the coefficient of determination
- AKA square of Pearson’s r (r^2)
What does r-squared tell us?
-the proportion of the total variation attributable by X
What type of relationship do SSR and SSE hold with each other?
- a reciprocal relationship
- as one sums increases the other decreases
If there is a stronger linear relationship between X and Y, what will happen to the explained and unexplained variation?
- greater explained variation
- lesser unexplained variation
What would a r-squared value of 1 mean?
- X explains 100% of the variation in Y
- we could predict Y from X without error
When X and Y are not linearly related, what happens to the explained variation and r-squared?
- both are zero
- X explains none of the variation in Y
What do you need to calculate for a linear relationship to really say if its a strong relationship?
-r-squared
What does it mean if the correlation coefficient is +1?
-there is a perfect positive relationship between the two variables
What does it mean if the correlation coefficient is -1?
-there is a perfect negative relationship between X and Y
What does it mean if the correlation coefficient is 0?
-no linear relationship between these two variables
How would you express a correlation coefficient of 0.65?
-A one standard deviation increase in X is associated with a 0.65 increase in Y, on average
Does the magnitude of a linear slope have anything to do with scatter?
- NO
- it’s possible to have a very deep line with scatter or a very shallow line with no scatter
What is the slope coefficient?
-b
What do r and b have in common?
- the same numerator
- thus, testing the hypothesis that r=0 is the same as testing if b=0
Why must we test to see if the relationship between the variables exists in the population from which the sample was drawn?
- since the data for a bivariate regression is based on a random sample
- called testing for significance
How do we test for significance?
-Pearson’s r since the slope is identical to this
What assumptions are made to test for significance in a bivariate relationship?
- Assume that both variables are normal in distribution (bivariate normal distributions)
- Assume the relationship between variables in somewhat linear
- Homoscedastic relationship
What is a homoscedastic relationship?
-The Y scores are evenly spread above and below the regression line for the entire length of the line
How do you determine if it is appropriate to proceed with the assumptions around the test of significance?
-look for homoscedascity
What are bivariate normal distributions?
-both variables are normally distributed
In hypothesis testing, what does it mean if you fail to reject the null?
- the Pearson’s r could have occurred by chance alone
- two variables are unrelated
What is hypothesis testing based on?
-sampling distribution of means
What is the sample distribution of means?
- describes the variation in the values of the mean over a series of samples
- based in the central limit theorem
How large do samples have to be to reach a normal distribution?
-greater than or equal to 30
What happens with a larger sample size in hypothesis testing?
-better approximation to the normal distribution and a more effective estimation of the population mean
What can be understood about b in hypothesis testing?
- it can be interpreted as a mean
- thus the regression equation should have the population regression slope
What does b produce?
- beta
- not always though
What is critical about b for hypothesis testing?
-that b is normally distributed is critical for hypothesis testing of OLS regression
Why can we use z to determine b and beta?
-since b is normally distributed in the population of samples
Why can we drop the beta in the formula for t?
-since beta is presumed to equal 0
What do the residuals indicate?
- how far the predicted value based on b is from each actual case
- suggest other factors besides X are influencing Y
The larger the standard deviation of X will cause what for the standard deviation of b?
- smaller SD of b
- better estimate the slope when we have a lot of values for the predictors
What are the three most commonly used levels of significance in quantitative research?
- p<0.05 *
- p<0.01 **
- p<0.001 ***
When SPSS produces coefficients of bivariate relationship which values correspond with bX, a and Sb?
- bX is unstandardized coefficient and B
- a is unstandardized coefficient and B
- Sb is std. error and the X variable
What are antecedent variables?
- Z effects X independently and Y independently
- no effect between X and Y
What are redundant variables?
- Z and X affect each other but only Z affects Y
- Z and X are simultaneous
What is the least squares multiple equation for two independent variables?
Y=a + b1x1 + b2x2
What is b1 and b2
-b1 is the partial slope of the linear relationship between the first independent variable and Y
What is the purpose of multiple regression?
- to examine the independent relationship between each predictor (IV) and an outcome (DV, Y) in a set of predictors
- holds all other variables constant
- statistical control
What is statistical control
-we cannot eliminate the effect of other variables on our Y so we use statistics to control
Is multiple regression as good as an experiment?
- No
- assumes that the relationship between variables can be assumed by a linear equation
- makes errors as small as possible
What is wrong with multiple regression?
-we cannot measure every variable that affects our dependent variable
What is the purpose of a in a regression equation?
-anchor for the regression
How realistic is a multiple regression model?
-all models are poor depictions of reality
What is e in the full multivariate regression equation?
- it indicates all the other influences besides all X’s in the model
- changes for every case
What can b be thought of as in the multivariate regression model/equation?
- each b is a weight
- expresses how much of Y each X contributes with a 1 unit increase in X
- each b indicates the independent effect of each X
What is covariance?
-measure of how two variables vary together
What value shows r in a SPSS correlation matrix?
-find the two variables you are interested in and look at where they intersect
What does it mean to look at the independent effect?
-remove other variables effect on it
How do we look at the independent effect of two independent variables with correlation?
-run both of them in the regression model
How do we find the full regression equation in SPSS?
- a is equal to unstandardized and B
- b1 is equal to unstandardized and X1
- b2 is equal to unstandardized and X2
Describe the regression equation Y=1.897 + 0.339Xage + 0.521Xmemory + e
- a one unit increase in age is related to a 0.339 unit increase in Y, controlling for memory
- if age and short term memory were both zero, we would predict a reading ability of 1.897
What is the multiple coefficient of determination?
- R^2
- since r^2 doesn’t work for multiple regression cause there is overlap
What is R^2?
- correlation between observed and predicted values from the multiple regression
- variance in the dependent variable accounted by the predictors in the regression
What would it mean if we had a R square value of 0.702?
-The amount of variance in Y X1 and X2 account for which is 70.2%
Why can we not just compare partial slopes?
-different units
What do we do to convert partial slopes into a comparable form?
-look at standardized coefficients
What are standardized partial slopes called?
-beta weights
How to interpret beta-weight values?
-the higher the beta-weight value the stronger the relationship regardless of + or -
In bivariate regression what type of strength do we observe with standardized coefficients?
-absolute
In multiple regression can we use standardized slopes to determine absolute strength?
- No
- Relative strength only
Is beta-weight equal to r?
-no
What does multiple regression do for spurious relationships?
-it is used to rule out spurious relationships among variables
What are the three types of spurious relationships?
- antecedent
- redundant
- suppression
What is suppression
- opposite of redundancy
- when the relationship between two variables gets stronger when you control for a third variable
How can we use stepwise regression to show spurious relationships?
- the unstandardized betas will change values in each model (go down)
- or the R square will change value in each model
How do you test for significance in multiple regression?
-use t equation of b/Sb
What forms does multicollinearity come in?
-extreme and near extreme
What is extreme multicollinearity?
- at least two of the X variables in a regression equation are perfectly related by a linear function
- correlation between X1 and X2 is 1
What is near-extreme multicollinearity?
- there are strong, although not perfect, linear relationships among the X’s
- correlation between X1 and X2 will be close to 1 or -1
How do you find near-extreme multicollinearity?
- regress each independent variable on all the other independent variables and look for a high R-square
- if any of these are above 0.6 this is concerning
Why is multicollinearity a problem?
- it will result in a larger standard error for its coefficients
- making it harder to find statistically significant coefficients (t)
What differs between the standard error for bivariate and multivariate regressions?
-correction factor for the covariance between the two predictors
What does greater covariance between two predictors result in?
-less reliable estimates because it inflates Sb
What is VIF?
-captures the factor to which two independent variables are collinear
How would you interpret a VIF of 9?
-you’re multiplying the standard error for a coefficient for a factor of 3
What variable will have a large VIF?
-independent variable that is highly correlated with other predictors in the model
What is the cut off for VIF?
6