Data Analysis Workshop 2 Flashcards

1
Q

What is a t-test?

List the assumptions of a t-test.

A
  • A t-test determines if the means of two datasets are significantly different from each other.

Assumptions include:

1 - The data is normally distributed.

2 - The populations being compared have the same variance (however the t-test is quite robust to unequal variance).

3 - The samples are independent from one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is parametric data?

A

Parametric data is data that is normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a one-way analysis of variance (ANOVA) test?

Why use an ANOVA test rather than just many individual t-tests?

What are the assumptions of an ANOVA test?

A
  • An ANOVA test determines if the means of three or more are significantly different from each other.
  • If an ANOVA test returns a statistically significant result, this indicates that at least two group means are significantly different from each other.
  • An ANOVA test cannot tell you which specific groups are statistically significant from each other, therefore a post hoc test is required.
  • An ANOVA test is used before post hoc tests (e.g. a t-test) because by doing one statistical test rather than many individual tests, the total error (p value) combined from all tests is lower.
  • The assumptions of an ANOVA test are the same as for a t-test.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Bonferroni correction?

When does this not need to be done?

A
  • A Bonferroni is used to reduce the acceptance p value (normally 0.05) of the false positive to account for the increase in error that results from using many statistical tests.
  • If multiple tests are done with hypothetical data, a Bonferroni correction does not need to be done.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can variance homogeneity be tested?

A

Homogeneity of variance can be tested using a Levene’s test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List 3 ways of measuring the association between two datasets that are non parametric.

A

Association between non parametric datasets can be measured by:

1 - Using a nonparametric test such, as a Kruskal-Wallis H test.

2 - Transforming the data to achieve normal distribution.

3 - Using a parametric test that tolerates nonparametric data well, such as a t-test or ANOVA test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

List 2 ways of measuring the association between two datasets that do not have homogenous variance.

A

Association between datasets with heterogeneous variance can be measured by:

1 - Using a nonparametric test, such as a Kruskal-Wallis H test.

2 - Using a Welch or Brown and Forsythe test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can association be measured between datasets that are not independent from one another?

A

If there is a lack of independence in data, there is nothing that can be done to determine association, and requires a redesign of the study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a correction analysis used for?

What is a regression analysis used for?

What is a linear regression analysis used for?

A
  • Correction and regression analyses is used to measure association between two continuous variables.
  • The difference between the two is that a regression analysis fits a curve and gives you a formula that describes the relationship.
  • A linear regression models the relationship between two variables by fitting a linear equation to the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a receiver operating characteristic (ROC) curve?

A
  • An ROC curve is used to show graphically the relationship between clinical sensitivity and specificity for every possible cut-off for a test or a combination of tests.
  • The curve is used to choose the most appropriate cut-off for a test, which has the highest true positive rate together with the lowest false positive rate.
  • The area under an ROC curve is a measure of the usefulness of a test in general.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly