Statistics Flashcards

1
Q

Formula for standard error of the mean (SEM)?

A

SEM = SD / square root on (n)

SD - standard deviation
n = sample size

SEM gets smaller as sample size (n) increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition of power of a study?

A

Power = 1 - the probability of type II error
The probability that a statistically significant difference will be detected
Probability of (correctly) rejecting the null hypothesis when it is false OR
Probability of confirming the alternative hypothesis when the alternative hypothesis is true
Power can be increased by increasing the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Examples of observational studies

A

Cohort study
Case-control study
Cross-sectional study
Case series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Studies organised in level of evidence they provide.

A
Systematic reviews  
RCTs
Cohort studies
Case-control studies
Cross-sectional studies
Case series
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Prospective cohort study

A

Sample recruited from population in the present, relevant predictors are measured, cohort is followed overtime to measure outcomes
Usual outcome measure is relative risk

Pro: more control over what is measured and how; can measure confounders
Con: expensive; wait until outcome occurs; rare outcome = need more participants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Retrospective cohort study

A

Cohort assembled after an outcome has occurred using stored data

Pro: cheaper, faster
Con: data quality limited

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Case-control study

A

Start off with people with the disease and ask about exposure
Usual outcome measure is odds ratio

Pro: efficient for rare diseases and outbreaks
Con: hard to find matched controls; recall bias; confounding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cross-sectional study

A

Random sample of a population in a point in time.

Descriptive: prevalence of a disease or exposure

Analytic: examine relationship between between different things e.g. obesity and arthritis

Can provide evidence of association but not about causality (hard to determine what came first)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Best study design for an intervention question?

A

Best primary study: RCT

Highest level of evidence: systematic review of RCTs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Best study design for question of harm or prognosis?

A

Prospective cohort study
Individual prospective cohort study
Retrospective cohort study
Case-control study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Best study type for questions of diagnostic test accuracy

A

Cross-sectional analytic study where the 2 tests are performed on the study participants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Best primary study type for prevalence of disease?

A

Cross-sectional descriptive study

Burden of disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Best primary study type for incidence of disease?

A

Cohort study

Specified period of time; looks at cause of disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Relative risk

A

The risk of something occurring relative to the chance of it occurring under different circumstances

= (incidence in exposed)/(incidence in unexposed)
i.e. use division

RR <1: treatment is beneficial
RR >1: treatment is harmful
RR = 1: treatment has no effect

Used in RCTs and cohort studies - need to know incidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Absolute Risk Reduction

A

= (incidence of disease in exposed) - (incidence of disease in unexposed)
i.e. Use subtraction

Must remain aware if exposure has increased or decreased the risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Number needed to treat

A

Number of people that need to be treated in order to prevent one negative outcome

NNT = 1 / (risk difference)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Odds Ratio

A

= (odds of exposure to the risk factor of interest in the cases) / (odds of exposure to the risk factor of interest in controls)
Used in case control studies

OR 0.6 = the exposed group is 40% less likely to develop specific outcomes compared to the control group

OR 1.5 = risk increased by 50%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

P value

A

Probability that the observed results of the study are due to chance rather than an actual effect

IF p<0.05, the probability of getting the results by chance alone is 5% (i.e. statistically significant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Confidence intervals

A

Provides us with a range within which we would expect the true effect to lie

Wide CI = poor precision
Narrow CI = good precision

IF using RR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Random error

A

Chance
Gives results either side of the true answer with the mean of all results being close to the true answer

Narrow confidence interval = less random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Systematic error

A

Bias

Differ in one direction from the truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Internal validity

A

How likely it is that the results are are correct for the sample of participants being studied.

Selection bias impacts the internal validity of a study

23
Q

External validity

A

How likely it is that the results will hold true for other settings
= generalisability of the study

24
Q

State 2 principles of a confounder

A
  1. has to be associated both with the risk factor of choice and the outcome
  2. fits into the causal pathway between the risk factor and the outcome (i.e. intervening variable)

Biases the results

25
Q

Effect modification

A

Where the risk factor or intervention acts differently in one group compared to another

E.g. UV exposure, increased risk of melanoma and skin type

26
Q

Loss to follow up

A

Losses before randomisation: affect the generalisability of our study

Losses after randomisation: relate to risk of bias

27
Q

Intention to treat analysis

A

Means that we analyse people in the groups that they were originally randomised to, regardless of what actually happens during the study

Pro: preserves the effect of randomisation
Con: dilutes power

28
Q

Composite endpoints

A

Rather than looking at several outcomes separately a study will combine several outcomes into the one composite measure that is used as the outcome

Why are they used?

  • smaller sample size required to show effect
  • allows assessment of ‘net’ effect of intervention

Why does it matter?

  • Outcomes of high clinical importance can be grouped with those of minor importance
  • Overestimate benefit of intervention
29
Q

What is a funnel plot?

A

Special graph produced to assess likelihood of publication bias; must have >10 studies
Point estimate of the effect (e.g. RR or OR) plotted against a measure of the study’s size or precision
True value down centre
- smaller studies = larger scatter
- larger studies = closer to the true value

30
Q

Sensitivity

A

Portion of those WITH the disease who have a positive test (i.e. true positive)

Sensitivity = TP / (TP + FN)

SnNout
When a highly sensitive test (Sn)
Is Negative (N)
the disease is ruled out (out)

If you want to avoid false negatives choose a test with high sensitivity (negative result in a sensitive test = confident patient doesn’t have disease)

31
Q

Specificity

A

The proportion of those without the disease who have a negative test (i.e. true negative)

Specificity = TN / (TN + FP)

SpPin
When a highly specificities test (Sp)
Is Positive (P)
The disease is ruled in (in)

If you want to avoid false positives choose a test with high specificity

32
Q

Positive predictive value

A

Probability of disease in those who test positive

= (TP) / (TP + FP)

Higher prevalence = higher PPV, lower NPV
Lower prevalence = lower PPV, higher NPV

33
Q

Negative predictive value

A

Probability of no disease in those who test negative

= TN / (TN + FN)

Higher prevalence = higher PPV, lower NPV
Lower prevalence = lower PPV, higher NPV

NPV / PPV depend upon the prevalence of the characteristic in a given population

34
Q

Positive likelihood ratio

A

= (probability of a +ve test in those with the disease) / ( probability of a +ve test in those without disease)

i.e. sensitivity / 1-specificity

Larger PLR = greater likelihood of disease
PLR > 10 will be useful in ruling in disease
PLR = 1 indicates a useless test

35
Q

Negative likelihood ratio

A

= (probability of -ve test in those with disease) / (probability of -be test in those without disease

i.e. (1-sensitivity) / specificity

Smaller NLR = lower likelihood of disease
NLR <0.1 will be useful in ruling out disease
NLR = 1 indicates a useless test

36
Q

Bias in screening

A

Lead time bias - apparent longer survival in screen detected cases as identified at earlier point in disease

Length time bias - slowly progressive disease more likely to be picked up by screening

37
Q

Level of evidence

A

Ia- evidence from meta-analysis of RCTs
Ib - evidence from at least one RCT
IIa - evidence from at least one well designed controlled trial that is not randomised
IIb - evidence from at least one well designed experimental trial
III - evidence from case, correlation and comparative studies
IV - evidence from a panel of experts

Grade A - based on evidence from at least 1 RCT
Grade B - based on evidence from non-RCT
Grade C - based on evidence from a panel of experts

38
Q

Post test probability

A

Pre test probability = prevalence
Post test probability = prevalence x LR

Post test probability after a +ve test = prevalence x PLR

Post test probability = (post-test odds)/(post test odds + 1)

Post test odds = (pre-test odds) x (likelihood ratio)

39
Q

Best estimate of prevalence?

A

Prevalence = incidence x duration

E.g. disease has annual incidence of 15 cases per 100,000. Mean survival after diagnosis is 5yrs.
Prevalence = (15 per 100,000) x 5 = 75 per 100,000

40
Q

Type 1 error

A

Rejecting the null hypothesis when it is in fact true OR
Accepting the alternative hypothesis when it is in fact false (i.e. a false positive result)

p value = probability of a type 1 error

41
Q

p value

A

Probability of a type 1 error OR
The probability of finding a difference when there is one
Significance is conventionally set at p < 0.5

42
Q

Type 2 Error

A

= power
Accepting the null hypothesis when it is false
Observing no difference when there is one
A false negative result
Rejecting an alternative hypothesis when it is true

43
Q

Power

A

= 1 - probability of Type 2 error
Likelihood of finding an effect when it is present
i.e. likelihood of avoiding false negatives

44
Q

Main modifiers of power

A
  1. Size of effect
    - more difficult to detect small effects
  2. Sample size
    - larger sample size = easier to detect effect
  3. Desired significance
    - i.e. p<0.001 will conclude fewer positive than p<0.05
  4. Standard deviation
45
Q

Multivariate analysis

A

Used to determine whether or not confounding is occurring due to other factors

46
Q

ROC curve

A

Y axis: true positive (sensitivity)
X axis: false positive (1-specificity)

Test with good performance swoops into the top L corner
Test close to a diagonal line is no better than chance at discriminating between those with and those without the disease

47
Q

Student’s T-test

A

Parametric (normally distributed)

Paired or unpaired

48
Q

Pearson’s product moment coefficient

A

Parametric

Correlation of 2 variables

49
Q

Mann-Whitney U Test

A

Non-parametric

Unpaired data

50
Q

Wilcoxon signed rank test

A

Non-parametric

Compares 2 sets of observations on a single sample

51
Q

Chi squared test

A

Non-parametric

Used to compare proportions or percentages

52
Q

Spearman, Kendall Rank

A

Non parametric

Correlation

53
Q

Paired vs unpaired data

A

Paired data: obtained from a single group of patients e.g. measurement before and after an intervention
Unpaired data: 2 different groups of patients e.e. Comparing response to different interventions in 2 groups

54
Q

Hazard Ratio

A

Similar to relative risk but used when risk is not constant in time.
Typically used when analysing survival over time
Reduction in risk of death or progression
HR of 0.84 = 16% reduction in risk