Lecture 2 - Continuous outcomes Flashcards
Analysis of Variance (ANOVA)
What is it used for?
Used to compare the means of three or more groups/samples. The means are used to determine possible differences between groups.
Analysis of Variance (ANOVA)
What assumptions need to be met?
- Independent variable = categorical
- Dependent varibale = scale level and normally distributed
- Variances (SD^2) in all groups are equal
Analysis of Variance (ANOVA)
* What solutions are there if the outcome on scale level is not normally distributed?
* Which test can be used to determine equal variances in all groups?
- Either transform the outcome or use the non-parametric Kruskal-Wallis test
- Test of Homogeneity of Variances
When three group means are compared with the use of ANOVA and the resulting p-value is <0.001, you can conclude that the means of the three groups differ significantly. However, ANOVA only gives an indication that there is a significant difference somewhere between groups. You can not conclude where this difference is located between the three groups.
How can you determine where the significant difference is located?
By doing a post-hoc procedure (i.e. using ANOVA to establish a significant p-value and using post-hoc to establish where the significant difference is located).
* Performing three independent sample t-tests
Why is a Bonferroni correction needed when doing a post-hoc procedure to determine where the significant difference is located after an ANOVA?
A bonferroni correction is needed to address the issue of inflated Type I error rates when conducting multiple statistical tests. When you perform three independent sample t-tests with a significance level of 0.05, the overall probability of making at least one type I error across all three tests becomes (3x0.05=0.15). Meaning that there is a 15% chance of observing a type I error (i.e. rejecting H0 while H0 is true).
Linear regression - continuous covariate
What is it used for?
Used to predict outcome Y with the use of the independent variable X, via a linear relationship.
Linear regression - continuous covariate
What assumptions need to be met?
- Linear relationship between dependent and independent variables.
- Independent variable discrete or continuous.
- Dependent variable continuous.
- Dependent variable has equal variances for each value of the independent variable.
The formula that belong to a linear regression with a continuous covariate is: y = a + b * x
* What is b0?
* What is b1?
- b0 = mean y when x=0
- b1 = mean difference in y when x increases with 1
So if you want to know whether a dependent variable is (significantly) associated with the independent variable, a linear regression model can be made.
The model has the following output:
* b0 = -2.266
* p = <0.001
* b1 = 0.031
Considering the linear regression formula, calculate y when x = 100 and when x = 101.
- x = 100: y = -2.266 + 0.031* 100 = 0.834
- x = 101: y = -2.266 + 0.031* 101 = 0.865
Dummy regression
What assumptions need to be met?
- Dependent variable is quantitative (e.g. 13.04).
- Residuals surrounding the regression line are roughly normally distributed
- Linear relationship between independent and dependent variables.
- Variances of residuals are equal for all X values.
You are analysing whether there is a difference in salary between people with different educational backgrounds (low, moderate, high). For this, 2 dummies are created:
* Dummy1: Moderate education coded as 1 + other (low and high education) coded as 0.
* Dummy2: High education coded as 1 + other (low and moderate education) coded as 0.
Name the overall regression formula and used this formula to calculate the mean salary for each education level.
Overall regression formula: salary (y) = B0 + B1 x Dummy1 + B2 x Dummy2.
* Low education: salary (y) = B0 + B1 x 0 + B2 x 0
* Moderate education: salary (y) = B0 + B1 x 1 + B2 x 0
* High education: salary (y) = B0 + B1 x 0 + B2 x 1