Biostatistics Flashcards

1
Q

What are the properties of a normal distribution?

A
  • Mean = Median = Mode
  • Interquartile ranges equally distributed about mean
  • 2/3 data lie within 1 SD of the mean. 95% data lie within 2 SD.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a range?

A

Difference between the highest and lowest value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is sample variance?

A

Variance = sum(data-mean)^2/n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is standard deviation (SD)?

A

SD = √Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What types of data are there?

A
  1. Continuous
  2. Ordinal
  3. Nominal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is continuous data?

A

Infinite number of data points on a continuous scale with no set intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is discrete data?

A

Finite number (countable) of data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ordinal data?

A
  • Variables can be ranked on finite scale (e.g. 1-10).

- Considered non-parametric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is nominal data?

A
  • Qualitative
  • Non-numerical and categorical (e.g. hair colour, sex…)
  • Non-parametric
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the features of a box & whisker plot?

A
  • Central line = Median
  • Mean is usually a cross
  • Boxes contain the interquartile ranges
  • Outliers are indicated as points beyond the whiskers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the coefficient of variation?

A

Coefficient of variation = variance/mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the standard error of the mean (SEM)?

A
  • The standard deviation of the sampling means about the population mean.
  • SEM = σ/√n
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a confidence interval?

A

A 95% confidence interval means that the true mean falls within a calculated confidence interval 95% of the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a null hypothesis (H0)?

A

Statement claiming that there is no difference between 2 populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an alternative hypothesis (H1)?

A

Statement claiming that there is a difference between 2 populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What types of alternative hypotheses are there?

A
  • Mean of A ≠ Mean of B
  • Mean of A > Mean of B
  • Mean of A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a test statistic?

A

The numerical value relating to a set of data used to determine whether to accept/reject a null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a type I error?

A
  • When a false positive is obtained.

- H0 is rejected when it was true.

19
Q

What is the probability of obtaining a type I error (α)?

A
  • α = Significance level of a test

- For test with 5% significance, chance of getting type I error is 5%.

20
Q

What is a type II error?

A
  • When a false negative is obtained.

- H0 is accepted when it was false.

21
Q

What is the probability of obtaining a type II error (β)?

A
  • β is dependent on the sample size.
22
Q

What is the power of a study?

A
  • The probability that H0 is correctly rejected.

- Power = 1 - β.

23
Q

What is the significance of power?

A
  • If a statistically non-significant result was obtained and the power was large, then the result is probably correct.
  • If a statistically non-significant result was obtained and the power was small, then there are 2 possibilities:
    1. There wasn’t a significant difference.
    2. There was a significant difference but it was not detected.
24
Q

What is multiple testing?

A

When multiple statistical tests are being applied to multiple sets of data (relating to the same parameters) simultaneously.

25
Q

What is the problem with multiple testing?

A

One test may turn out to be significant just by chance.

26
Q

How do we correct for multiple testing?

A
  • Use the Bonferroni correction, whereby the new critical value = Original critical value/number of tests.
27
Q

What is an unpaired t-test used for?

A

Used to compare the means of 2 independent groups of measurements.

28
Q

What are the assumptions required for an unpaired t-test?

A
  • Independence
  • Normally distributed
  • Variance is the same in both groups
29
Q

What is the name given to 2 sets of data with the same variance?

A

Homoscedastic

30
Q

How can homoscedasticity be determined?

A

Using Levene’s test (F-test)

31
Q

How are the degrees of freedom calculated for a t-test?

A

Degrees of freedom = Sample size (n) - 2

32
Q

What is a paired t-test used for?

A

When 2 means of results taken from paired population are compared before and after a specific intervention.

33
Q

What are the types of pairing?

A
  • Self-pairing
  • Natural pairing (e.g. siblings)
  • Matched pairs (sex, size, weight…)
34
Q

What are the assumptions required for a paired t-test?

A
  • Unbiased selection

- Data is normally distributed

35
Q

How is variance analysed?

A
  1. There are many sets of data corresponding to different groups.
  2. The mean and variance of each set of data is calculated individually and the variance is summed (A).
  3. Data from each set is merged into one big set of data.
  4. Variance of this set of data calculated relative to the overall mean (B).
  5. If A = B, group means are practically equal to each other.
36
Q

What are one-way ANOVA tests used for?

A

Used to test whether there are any significant differences between the means of multiple groups of data.

37
Q

What are two-way ANOVA tests used for?

A

Used to test whether there are any significant differences between the means of multiple groups of data with multiple input variables.

38
Q

What are Wilcoxon rank sum tests used for?

A

Used to determine whether there is significant difference between the means of a pair of related ranked data.

39
Q

What are Mann-Whitney U tests used for?

A

Used to determine whether there is significant difference between the means of 2 sets of independent data that are not normally distributed.

40
Q

What are Kruskal-Wallis one-way ANOVA tests used for?

A

Extension of Mann-Whitney U test. Used to determine whether there is significant difference between the means of multiple sets of independent data that are ranked or not normally distributed.

41
Q

What are Friedman two-way ANOVA tests used for?

A

Used to determine whether there is significant difference between the means of multiple sets of related data that are ranked or not normally distributed.

42
Q

What is Spearman’s rank correlation coefficient used for?

A

To determine whether there is statistical dependence between non-parametric data.

43
Q

What are the advantages of odds ratios compared to relative risk?

A
  • Odds ratios can be determined for case-control (retrospective) studies
  • Case control studies allow for covariant adjustments to be easily made
44
Q

What can statisticians help accomplish?

A
  • Obtain relevant sample size
  • Help with study design
  • Conduct analysis
  • Get paper published