Statistics Flashcards

1
Q

p-value

A

Probability of observed result or one more extreme occurring when the null hypothesis is true.

The probability of getting the observed results by chance. When p is greater than the alpha level, the results are statistically significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

p<0.05

A

statistically significant, can reject null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

95% confidence interval

A

A range, between which the population mean value will lie 95% of the time (NOT there is a 95% chance the population mean will occur between those intervals) … so if you did that small study a hundred times, 95% of the time the population mean value would lie within the confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sample point estimate mean (vs. population mean)

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Relative risk

A

Risk of developing disease in the exposed group compared to the risk of developing disease in the unexposed group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you calculate relative risk?

A

RR = A/(A+B) / C/(C+D)

(those who got the disease in all exposed vs those who got the disease in all not exposed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What types of studies can RR be used in?

A

Prospective studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Odds ratio

A

Ratio of odds of something happening vs the odds of something not happening

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which studies is odd ratio used in?

A

Case-control studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you calculate odds ratio?

A

OR = A/C / B/D

(the odds of getting the disease when exposed vs the odds of not getting the disease when exposed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If a disease is really rare, the odds ratio and relative risk actually end up being quite similar. True or false?

A

True - however, they are not the same thing … and most times they end up being very different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hazard ratio

A

Broadly equivalent to relative risk (RR); useful when the risk is not constant with respect to time (so it uses data from different time points, where the risk might be changing over a period of time. Usually hazard ratio is used in the context of survival but in statistics survival does not mean life or death, it could be whether or not a patient got a disease/survived or not. Hazard ratio takes into account the principle of time whereas risk doesn’t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relative risk 1.45 in plain language

A

45% more likely to have outcome X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

E.g. one group drank coffee, the other group didn’t drink coffee, outcome is tachycardia, RR is 1.45. How would you explain this?

A

In the group where patients drank coffee were 45% more likely to have tachycardia/ probability of having tachycardia is 45% higher in the group that drank coffee (1.45 times more likely to have tachycardia … too complex. If RR is 5.1 could say 5 times more likely to have tachycardia - swap from percentage to the number)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Case control, looking at exposure to risk factors in patients that had oral cancer. Looking at risk factor chewing tobacco, OR is 1.6. Explain this in plain language?

A

In those who had oral cancer, the odds of chewing tobacco were 1.6 times higher than those who did not have oral cancer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

An OR or RR means…

A

there’s no difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Odds ratio 1.6 in plain language

A

Odds of exposure to factor X is 1.6 times higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

E.g. RCT, comparing recurrence of cancer following use of a new chemotherapy drug, HR 0.79, explain this in plain language

A

Those who receive the chemotherapy drug at any point during this study were 21% less likely to have cancer occurrence. Similar to RR but taking into account time in the study. Hazard ratio of 1 means there is no difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Hazard ratio 0.79 in plain language

A

At any particular point, group A is 21% less likely to have outcome X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Incidence

A

Number of new cases of a disease within a specific period of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Prevalence

A

Number of cases of disease at a given time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Absolute risk reduction (ARR)

A

Incidence [group 1] - incidence [group 2]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Number needed to treat (NTT)

A

1/ARR -> tells you how many people need to be treated with that intervention in order to prevent one outcome occurring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why is NTT useful?…

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is relative risk reduction (RRR)?

A

ARR / incidence [control group] as %

26
Q

Type I error

A

Reject H0 when (statistically significant results) when H0 is actually true - false positive.

Due to bias, confounding, data dredging

27
Q

Type II error

A

False negative, wrongful acceptance of the null hypothesis = beta = 1-alpha.

Due to the sample size being too small or measurement variance being too large.

28
Q

Beta

A

Probability of making a type II error (under 0.8 and we are not too fussed?)

29
Q

Power

A

Ability to pick up difference when a difference exists. ‘Ability to reject a false H0’. Probability of not making a type II error.

30
Q

How can we increase power?

A

Increase sample size, increase effect size, increase measurement precision

31
Q

Per-protocol analysis

A

Where you include patients in analysis within the study only if they’ve finished doing the study protocol properly

32
Q

Advantages of per protocol analysis

A

Accurate representation of the effect of the intervention because you have only included the people who have properly done the intervention.

33
Q

Disadvantages of per protocol analysis

A

Susceptible to attrition bias and exclusion bias

34
Q

Intention-to-treat analysis

A

Where you usually include all the patients who have at least attempted the intervention

35
Q

Advantages of ITT analysis

A

More accurate of results in clinical practice because in practice patients do not always follow instructions/protocols

36
Q

Disadvantages of ITT

A

Not getting a true, accurate estimate of how well the drug actually does in optimal conditions

37
Q

Null hypothesis

A

The assumption that any difference between experimental groups is due to chance

38
Q

Evidence-based medicine

A

The conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients

39
Q

Hazard rate

A

The probability of an endpoint in a time interval divided by the duration of the time interval

40
Q

Confounder

A

A confounder has a triangular relationship between the exposure and outcome but is not on the causal pathway. It makes it appear as if there is a direct relationship between the exposure and outcome (positive confounder) or might mask an association that would have been present (negative confounder)

41
Q

Methods for dealing with missing data (in ITT analysis)

A
  • Worst-case scenario
  • Hot deck imputation: fill in missing values from similar subjects with complete records
  • Last observation carried forward
42
Q

Absolute risk

A

Incidence rate of the outcome = outcome in either control or experimental arm/total number of participants in arm

43
Q

Relative risk reduction

A

Reduction in risk in control group vs. experimental group/risk in control group

= (CER-EER)/CER

44
Q

Standard deviation of data interpretation

A

The narrower the standard deviation, the less important it is to have a large sample size

45
Q

Parametric, paired, 2 groups

A

Paired t-test

46
Q

Parametric, paired, >2

A

One way ANOVA

47
Q

Parametric, unpaired, 2 groups

A

Independent t-test

48
Q

Parametric, unpaired, > 2 groups

A

One way ANOVA

49
Q

Non-parametric, paired, 2 groups

A

Wilcoxon signed rank

50
Q

Non-parametric, paired, > 2 groups

A

Friedman test

51
Q

Non-parametric, unpaired, 2 groups

A

Mann-Whitney U test

52
Q

Non-parametric, unpaired, > 2 groups

A

Kruskal Wallis test

53
Q

Parametric data is

A

data that assumes a normal distribution. When data sets are large enough, parametric statistical tests can be employed regardless of normality. Parametric tests are generally considered to have greater statistical power.

54
Q

Non-parametric data is

A

data that does not assume a normal distribution. The data is ordinal, ranked, or has outliers that cannot be removed.

55
Q

Time to event analysis: based on Kaplan-Meir curve. Can use:

A

Cox proportional hazards, log-rank or Wilcoxon two-sample test. Cox model is the most used.

56
Q

Kaplan-Meier curves

A

These are commonly used to describe survival and compare it between groups. It provides an intuitive graphical representation. They are mainly descriptive. They do not control for covariates and cannot accommodate time-dependent variables.

57
Q

Retrospective subgroup analysis

A

Data dredging means that some associations will crop up due to chance. Dredging: “cherry-picking of promising findings leading to a spurious excess of statistically significant results in published or unpublished literature”.

58
Q

Kaplan-Meier Survival Plot

A

A plot that is utilized over the entire study period (no defined timepoint), which tries to account for censored data.

59
Q

How to compare if two Kaplan-Meir curves
are different?

A

Log-rank test

60
Q

How to do power calculations

A

Power is the ability to discern a certain difference if that difference exists. You usually pick a clinically meaningful difference. You need a population mean and standard deviation AND:
▪ The standard deviation of the test group
▪ The clinically meaningful difference of the test group
▪ Then you can calculate the size of the sample you need for certain power