Statistics, epidemiology and study design Flashcards

1
Q

What formular should be used to calculate combined sensitivity or specificities for multiple diagnostic tests?

A

Combined probablity (specificty or sensitivity):

= 1 - (1- spec/sen 1) x (1 - spec/sen 2)

E.g. A test has a specificity of 60%. Another test has a specificity of 80%. If both tests are performed, the specificity of the result is:

= 1- (1-0.6) x (1-0.8)
= 1 - (0.4 x 0.2)
= 1 - 0.08
= 0.92
= 92%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the geometirc mean? It’s a measure of central tendency.

A

A special type of mean that corrects better for very large a very small numbers. It’s calculate by multiplying n values together, then taking the nth root of the results.

E.g. geometric mean of 2, 3 and 10 (n = 3)

= cubed root of(10 x 2 x 3)
= 3.91

Compare with the arithemetic mean:

(10+2+3)/3

= 5 (shifted much more greatly towards the 10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s the arithemtic mean?

A

It’s the normal mean - (1+2+3+… + x) / n (where n = number of numbers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the mode?

A

The value that occurs the most often in a set of numbers. It’s a meausre of central tendency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are non-parametric tests used for?

A

Statistical tests to be used when data is nor noramlly distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What has more statistical power, parametric or non-parametric testing, for discovering significant effects?

A

Parametric tests on normally distributed date.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What might lead to data not being normally distributed and therefore requiring non-parametric stastical tests?

A

When the data is ranked (aka ordinal data, when central tendency is better represented by the median (many or large outliers), when the sample size is too small, when the outliers cannot be removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What some examples of parametric statistical tests?

A

T-test and ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are you comparing with a T-test? How does this compare to ANOVA?

A

You are comparing the means of 2 normally distributed data sets to look for a meanful difference between the two. If you wanted to compare the means of more than 2 groups, you need to do a ANOVA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s a paired t-test used for?

A

When each value in one dataset corresponds meaningully to a value in the the other dataset. E.g. when dataset 1 is a measurement before a durg is given, and dataset 2 is a measurement after the drug was given on the same set of people.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is 1 tailed t-test ued for?

A

When you need to no if the mean you are interested in was higher (or lower) than the mean you are comparing to. A 2 tailed t-test will only tell you mean 1 is significantly different to mean 2 or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are some examples of non-parametric statistical test?

A

Chi-square test, Kruskal Willis test (ANOVA equivalent for non-parametric datat), and the Mann Whitney tests (the equivalent to the

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some examples of non-parametric statistical test?

A

Chi-square test, Kruskal Willis test (ANOVA equivalent for non-parametric datat), and the Mann Whitney tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the purpose of comparative statistical tests?

A

Comparing the same measure of central tendency (e.g. mean/median/mode) from 2 or more data sets to establish whether or not there is a significant difference between the two groups with respect to 1 or more outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When two data sets are normally distributed, and parametric testing is indicated to establish if there is a significantly different measure of central tendency present or not, what measure of central tendency should be compared?

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the data types that are important before choosing a statistical test?

A

Quantitative (might be continuous or discrete) data is that which is made up of numbers on an infinite scale that can be added, subtrated and divivded. Categorical data is that which falls into discrete buckets - it can be nominal (e.g. yes, no, maybe,) Ordinal (ranked, e.g. unlikley, likely, very likely), interval (ranked and with fixed intervals - e.g. scores 1-10, 11-20, 21-30).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which comparative statistical test can be used for non-normally distributed data that is catagorical to compare two datasets?

A

Chi-square test of independence, spearman’s r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What’s the name of the test used typically in table 1 to determine if a study population between two arms are heterogenous?

A

Chi-square test of best fit (or cochrane Q)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which comparative statistical test can be used for non-normally distributed data that is catagorical to compare 3
or more datasets?

A

Kruskal-Wallia (the non-parametric test version of the ANOVA).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If you wanted to compare 3 parametric datasets with regards to 2 or more outcome variables, which statistical test would you use?

A

MANOVA (modfied ANOVA allows multiple outcomes to be tested)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If you have wanted to compare to non-parametric quantitative data sets that are un-paired, how would you do this?

A

Wilcocon Rank-Sum test. For paired data, do the paired Wilcoxon Signed-rank test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the formular for number needed to treat?

A

The inverse of the absolute risk reduction.

So 1 / (risk 1- risk 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are odds?

A

The number of times something happened compared to the number of times something didn’t happen to members of a population expressed as a ratio.

e.g. There was a population of 100 people who went for a walk. 10 fell over. What are the odds of falling over when walking if in this population?

10 fell over
90 did not fall over

Odds of falling over

=10:90
=1:9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is relative risk?

A

The chance something will happen to a proprotion of the population expressed as percentage or a decimal. It is calculated by dividing the number of people who the event occured to by the total number in the population.

E.g. There was a population of 100 people who went for a walk. 10 fell over. What is the risk of falling over when walking if in this population?

Relative risk:

10 fell over
100 went walking (population)

= 10/100
= 1/10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is absolute risk?

A

Risk of something occuring in one data set, minus the risk of that thing occuring in another data set.

E.g.
In population A, the risk of falling was 1/10.
In population B, the risk of falling was 4/10.

The absolute risk reduction acheived by being in population A is:

4/10 - 1/10
= 3/10.

26
Q

What is the inverse of absolute risk reduction?

A

Number needed to treat.

27
Q

What type of bias to randomisation reduce?

A

Selection bias (i.e. any known or unknown biases applied to placing participatns into different arms of a trial).

28
Q

What is ascertainment bias?

A

This is the biases that originates from investigators knowning what type of intervention a participant is receiving, and threatening to distort how the results are intepreted.

29
Q

What study design model best reduces ascertainment bias?

A

Blinding the investigators (single blinding) and the participatns (double blinding)

30
Q

What is bias in handling dropouts?

A

This bias changes study outcomes when cohort characteristics change due participant drop outs.

31
Q

What is recall bias?

A

This is bias that occurs due to inaccurate recollection of events. It impacts retrospective studies.

32
Q

How is recall bias best mitigated with study design?

A

By creating a prospective study.

33
Q

How is sensitivty calculated?

A

True positive number/(true positive number + false negative number)

34
Q

Calculate sensitivity of the new test using the attached dataset.

A

True positive number/(true positive number + false negative number)

6/(6+0) = 1.

Sensitivity = 100%

35
Q

How is specificity calculated?

A

True negatives/(true negatives + false positives)

36
Q

Caculate the specificity of the new test using the attached dataset

A

True negative/(true negative + false positive)

20 / (20 +9) = 20/29

= ~69%

37
Q

How do you calculate the positive predictive value of a test assuming that the population prevalence of the disease is the same is the disease prevalence amongst the specimens used to evluate the test?

A

True positivies/(true positives + flase positives)

38
Q

How do you calculate the negative predictive value of a test assuming that the population prevalence of the disease is the same is the disease prevalence amongst the specimens used to evluate the test?

A

True negatives/(true negatives + false negatives)

39
Q

Calculate the positive predictive value of the new test using the dataset attached.

A

PPV = True positive/(true positives + false positives)

6/(6+9)

=6/15
=2/5
=4/10
=40%

40
Q

Calculate the negative predictive value of the new test using the dataset attached.

A

NPV = True negatives/(true negatives + false negatives)

20 /(20 +0)

=1
=100%

41
Q

How do you calculate the false negative rate?

A

1 - sensitivty

42
Q

How do you calculate the false positive rate?

A

1 - specificity

43
Q

What is spearman’s rank (r) correlation test used for?

A

It’s a non-parametric data statistical test that examines the dependence of two variable on one another. E.g. the amount of attendence to a class and how they relate to results in an exam.

44
Q

What is the interquartile range?

A

The middle 50% of datapoints. It’s the 75th perentile mine the 25th percentile.

45
Q

What percentage of values fall within 1 standard deviation from the mean in a normal distribution?

A

68%

46
Q

What percentage of values fall within 2 standard deviations from the mean in a normal distribution?

A

95%

47
Q

How is the variance calculated?

A

Standard deviation squared

48
Q

What is a cross-sectional study?

A

A subsection of a cohort is studied at a particular point in time. The exposure and outcome are measured at the same time. E.g. number of kids on a playground that get sunburnt (outcome) and whether or not they are wearing a hat (exposure).

49
Q

What is a case-control study?

A

Observational study that looks at two otherwise equivalent groups who have had a differing outcome. E.g. 40yo males with testicular ca vs 40yo males without testicular cancer. It then examines what might have contributed to the outcome.

50
Q

What is publication bias?

A

Over representation of studies in publication with positive or signifcant effects. This is a problem for systematic reviews.

51
Q

What is the best method to evaluate for publication bias when performing a systematic review?

A

A funnel plot

52
Q

What is a funnel plot?

A

A graphical respresnation of the size of trials plotted agains the effect size they report. As the size of the trial increases, they will converge around the true underlying effect. A symmetric inverted funnel shap indicates that publication bis is unlikely.

53
Q

What tests are used to measure heterogeniety across studies examined in a metaanlaysis?

A

Chi squared test and the I squared test. Low p-values in chi squared test reporting are consistent with low heterogeneity. I squared is reported as a percentage, with 0% indicating no heterogeneity and 100% being very heterogenous studies.

54
Q

What is case fatility?

A

The proportion of people within a population affected by a disease that die from that disease.

55
Q

What is attack rate?

A

The measure of frequency of morbitidy, or speed of spread, of disease in at risk population. Used in disease outbreaks. Calculate as number ill/ number at risk of becoming ill.

56
Q

What is the population attributable risk?

A

The proportion of incidence of disease in a population that is atributable to a particular risk of interest. It’s often used to establish the value of a public health measure to remove a risk be calculating the associated reduction in overall disease prevalence that would result.

57
Q

What are the subtypes of information bias?

A

Differential and non-differential misclassification bias

58
Q

What is differential misclassification bias?

A

Random or systematic inaccuracy of measurement of a variable during a study. Labelled ‘differential’ misclassification bias when the variable with an error is affected by the status of another variable.

59
Q

What is a non-differential misclassification bias?

A

Random or systematic inaccuracy of measurement of a variable during a study. Labelled ‘non-differential’ misclassification bias when the variable with an error is independent of other variables measured in the study.

60
Q

What’s a crossover study?

A

A prospective study in which participants recieve all the treatments. So placebo, treatment 1, treatment 2 and so on in some defined sequence. Crossovers studies help to reveal confounding factors by allowing the same individuals to be the control arm and experimental arm at some point in the study. So, if a particular group has an unknown confounding factor, it will become evidence when the crossover occurs.

61
Q

What requires fewer participants to be statistically powered: a crossover study or a non-crossover study?

A

Crossover study. They are very statistically efficient and require fewer participants to obtain meaningful power.