Linear Regression 2 Flashcards
The purpose of significance testing in linear regression.
To assess whether the regression line, explains a sufficient amount of variance, in the data.
How do you calculate significance (F-test)
- Input the sum of squares
- input the degrees of freedom
- divide the sum of squares by the degrees of freedom
- divide the MS of the regression from the MS of the error
- compare the observed F-value to the F-critical
What does the ‘R’ value in the model summary table represent
The correlation coefficient (Pearson’s r)
What does ‘R square’ value in the model summary table represent
The variance explained by the regression.
In the ‘coefficient’ table what does the (constant) value represent
The intercept of the regression line
In the ‘coefficient’ table what does the [variable name] value represent
The slope of the regression line
What does the Standard Error of the Estimate reflect?
How the data will be distributed around the regression line.
What is normal distribution?
When we assume that data is distributed normally, meaning
1. Mean is at the centre of regression line
2. SD is equally distributed on both sides of the regression line
3. Observations further from the regression line are less likely
What does the 68-95-99.7 Rule tell us
how the data will be distributed within a normal distribution in terms of standard deviation
* 68% of the data falls within 1 standard deviation
* 95% of the data falls within 2 standard deviations
* 99.7% of the data falls within 3 standard deviations
How do you calculate the Standard Error of the Estimate
- calculate the sum of squares error
- divide the sum of squares error by the sample size -2
- Take the square root
If you have the ANOVA table, you can simply take the square root of the mean square error.
What are the four primary assumptions for simple linear regression?
- Linearity
- Normality
- Homogeneity of variance (homoscedasticity)
- Independence
What is the Linearity Assumption?
We assume the data is linear.
If data is nonlinear, the x-value will systematically over/underestimate the y-value as it changes.
What is the normality assumption?
The assumption that the data will be distributed normally around the regression line
- mean in the centre
- even SD on both sides
* If the data is not normal, we cannot predict how far the data is likely to fall from the regression line—i.e., the 68-95-99.7 rule
What is the Homogeneity of Variance assumption?
The assumption that the error variance will vary the same amount at all points on the regression line
* If the error variance is not equal, we cannot consistently predict how far the data will fall from the regression line
What is the Independence assumption?
The assumption that there are non-overlapping observations in the data
* For between-subjects factor, each data point should be separate subjects
* For within-subject designs, the data for each condition is only represented once
* If you do not have an independent sample, your regression is biased towards the duplicated data