Wk 12: Modelling Association Flashcards
If we carry out 1 hypothesis test with a significance threshold of 5%, the probability of making a type 1 error is ____ %.
5
If we carry out 10 hypothesis tests with a significance threshold of 5%, the probability of making a type 1 error is ____ %. Why?
>5
- If you are doing multiple comparisons, in order to reduce the chance of type 1 error, we decrease the significance level to 5%/10 (Bonferroni correction), so overall significant level is <5%. Or we look at Tukey HSD.
If we carry out 1 hypothesis test with a significance threshold of 5%, the probability of making a type 2 error depends on _____. Why?
(1 - power)
- To increase power, we can increase sample size, increase effect size, decrease errors, decrease methodological variability.
- Experiment with >80% power is worthwhile.
- Power analysis needs survey of literature, small sample study (pilot study), in order to figure out a clinically relevant sample size, typical variability etc. before the real experiment.
What are 4 characteristics?
- All are normal distribution, but because of small sample (n = 20), they look skewed.
- 1st box plot: Normal.
- 2nd box plot: Median is far from the centre. Whiskers are negatively skewed.
- 3rd box plot: Median is positively skewed. Whiskers are negatively skewed.
What are 3 characteristics?
- 1st box plot: Not normal. Positively skewed.
- 2nd box plot: Normal.
- 3rd box plot: Chance of seeing 1 outlier in normal distribution is 1% - acceptable.
What are 4 characteristics?
- n = 100
- 1st box plot: 3-4 outliers in a sample >20 is not normal - seriously deviated.
- 2nd box plot: 1 outlier is acceptable. Normal.
- 3rd box plot: Normal.
What is a simple linear regression?
- The mean response is given by a straight line with constant normal variability about that line.
- Least-squares line: A line that minimises the sum of the squared residuals.
What are 4 assumptions for linear regressions?
- Independent observations
- Linear association
- Normal variability
- Skews do not matter as much for large sample size because it will approach normally distribution (central limit theorem).
- But skews will undermine predictions made by the linear model. You can log transform the data to get rid of the skew.
- Can again check normal variability by checking residuals on scatter plot, P-P plot & histogram.
- Equal variances
What are regression models that use a straight line to estimate the mean response based on a predictor? What are 4 characteristics?
y = b0 + b1x
- y is the mean response
- x is the predictor
- Null hypothesis: The slope in the relationship is 0 (flat line).
-
P-value is the probability of getting the b1 value by chance if there was no relationship between the response and the predictor.
- p < 0.05 suggests there is evidence against the null hypothesis, so there is a relationship between the response and the pred ictor.
What are 3 characteristics?
- Null hypothesis: height does not affect mass (flat line).
- Mass and height show positive association (line going up).
- Residual is the vertical distance from each dot to the line.
- Dots above line have positive residual ○ Dots below line have negative residual
What are 2 characteristics?
Checking residuals on scatter plot (simple linear regression)
- A constant band of dots suggests normal variability. Dots not converging or diverging.
- In this, the left side has less variability, the right side has more variability.
What are 3 characteristics?
Checking residuals on P-P plot (simple linear regression)
- If you have normal distribution, the dots should fall on the line.
- This shows deviations in the middle, so the residuals are not normal
- Deviations on top or bottom do not matter as much as middle
What are 2 characteristics?
Checking residuals on histogram (simple linear regression)
- Normal distribution should be a bell curve
- Positively skewed distribution
What are 2 characteristics?
Interpretations of Tukey HSD
- p = 0.674 means there is no difference between “last” and “uncertain”
- p < 0.001 means there is a significant difference between the other groups.
What are 3 characteristics?
Interpretation of linear regression tables
Model summary table
- R is the Pearson correlation coefficient.
- R square is the % of variability explained by the model.
- For simple linear regression, R square is the square of Pearson correlation coefficient ®.
- R square (0.556) means the linear regression model explains 55.6% of the variability.