Data Analysis Workshop 2 Flashcards
What is a t-test?
List the assumptions of a t-test.
- A t-test determines if the means of two datasets are significantly different from each other.
Assumptions include:
1 - The data is normally distributed.
2 - The populations being compared have the same variance (however the t-test is quite robust to unequal variance).
3 - The samples are independent from one another.
What is parametric data?
Parametric data is data that is normally distributed.
What is a one-way analysis of variance (ANOVA) test?
Why use an ANOVA test rather than just many individual t-tests?
What are the assumptions of an ANOVA test?
- An ANOVA test determines if the means of three or more are significantly different from each other.
- If an ANOVA test returns a statistically significant result, this indicates that at least two group means are significantly different from each other.
- An ANOVA test cannot tell you which specific groups are statistically significant from each other, therefore a post hoc test is required.
- An ANOVA test is used before post hoc tests (e.g. a t-test) because by doing one statistical test rather than many individual tests, the total error (p value) combined from all tests is lower.
- The assumptions of an ANOVA test are the same as for a t-test.
What is a Bonferroni correction?
When does this not need to be done?
- A Bonferroni is used to reduce the acceptance p value (normally 0.05) of the false positive to account for the increase in error that results from using many statistical tests.
- If multiple tests are done with hypothetical data, a Bonferroni correction does not need to be done.
How can variance homogeneity be tested?
Homogeneity of variance can be tested using a Levene’s test.
List 3 ways of measuring the association between two datasets that are non parametric.
Association between non parametric datasets can be measured by:
1 - Using a nonparametric test such, as a Kruskal-Wallis H test.
2 - Transforming the data to achieve normal distribution.
3 - Using a parametric test that tolerates nonparametric data well, such as a t-test or ANOVA test.
List 2 ways of measuring the association between two datasets that do not have homogenous variance.
Association between datasets with heterogeneous variance can be measured by:
1 - Using a nonparametric test, such as a Kruskal-Wallis H test.
2 - Using a Welch or Brown and Forsythe test.
How can association be measured between datasets that are not independent from one another?
If there is a lack of independence in data, there is nothing that can be done to determine association, and requires a redesign of the study.
What is a correction analysis used for?
What is a regression analysis used for?
What is a linear regression analysis used for?
- Correction and regression analyses is used to measure association between two continuous variables.
- The difference between the two is that a regression analysis fits a curve and gives you a formula that describes the relationship.
- A linear regression models the relationship between two variables by fitting a linear equation to the data.
What is a receiver operating characteristic (ROC) curve?
- An ROC curve is used to show graphically the relationship between clinical sensitivity and specificity for every possible cut-off for a test or a combination of tests.
- The curve is used to choose the most appropriate cut-off for a test, which has the highest true positive rate together with the lowest false positive rate.
- The area under an ROC curve is a measure of the usefulness of a test in general.