Midterm 3 Flashcards
In OLS regression, total variation or deviation follows the logic of what test?
-test of significance called analysis of variance (ANOVA)
Total variation in bivariate regression represents what?
-the total sum of squares SST
What indicates the explained variation in a bivariate regression?
- SSR sum of squares regression
- the amount of variation in Y accounted for by X
- amount of total variation that is explained by regression equation
What is the SSR also called?
-model sum of squares SSM
What represents the amount of variance left over in Y that the bivariate regression didn’t account for?
- sum of squared errors (SSE)
- Sometimes called residual sum of squares (SSR)
What is the most important use of SST, SSE and SSR?
- calculation of the coefficient of determination
- AKA square of Pearson’s r (r^2)
What does r-squared tell us?
-the proportion of the total variation attributable by X
What type of relationship do SSR and SSE hold with each other?
- a reciprocal relationship
- as one sums increases the other decreases
If there is a stronger linear relationship between X and Y, what will happen to the explained and unexplained variation?
- greater explained variation
- lesser unexplained variation
What would a r-squared value of 1 mean?
- X explains 100% of the variation in Y
- we could predict Y from X without error
When X and Y are not linearly related, what happens to the explained variation and r-squared?
- both are zero
- X explains none of the variation in Y
What do you need to calculate for a linear relationship to really say if its a strong relationship?
-r-squared
What does it mean if the correlation coefficient is +1?
-there is a perfect positive relationship between the two variables
What does it mean if the correlation coefficient is -1?
-there is a perfect negative relationship between X and Y
What does it mean if the correlation coefficient is 0?
-no linear relationship between these two variables
How would you express a correlation coefficient of 0.65?
-A one standard deviation increase in X is associated with a 0.65 increase in Y, on average
Does the magnitude of a linear slope have anything to do with scatter?
- NO
- it’s possible to have a very deep line with scatter or a very shallow line with no scatter
What is the slope coefficient?
-b
What do r and b have in common?
- the same numerator
- thus, testing the hypothesis that r=0 is the same as testing if b=0
Why must we test to see if the relationship between the variables exists in the population from which the sample was drawn?
- since the data for a bivariate regression is based on a random sample
- called testing for significance
How do we test for significance?
-Pearson’s r since the slope is identical to this
What assumptions are made to test for significance in a bivariate relationship?
- Assume that both variables are normal in distribution (bivariate normal distributions)
- Assume the relationship between variables in somewhat linear
- Homoscedastic relationship
What is a homoscedastic relationship?
-The Y scores are evenly spread above and below the regression line for the entire length of the line
How do you determine if it is appropriate to proceed with the assumptions around the test of significance?
-look for homoscedascity
What are bivariate normal distributions?
-both variables are normally distributed
In hypothesis testing, what does it mean if you fail to reject the null?
- the Pearson’s r could have occurred by chance alone
- two variables are unrelated
What is hypothesis testing based on?
-sampling distribution of means
What is the sample distribution of means?
- describes the variation in the values of the mean over a series of samples
- based in the central limit theorem
How large do samples have to be to reach a normal distribution?
-greater than or equal to 30
What happens with a larger sample size in hypothesis testing?
-better approximation to the normal distribution and a more effective estimation of the population mean
What can be understood about b in hypothesis testing?
- it can be interpreted as a mean
- thus the regression equation should have the population regression slope
What does b produce?
- beta
- not always though
What is critical about b for hypothesis testing?
-that b is normally distributed is critical for hypothesis testing of OLS regression