Statistics Flashcards

1
Q

What is probability?

A

the number with outcome/the total number (scale 0 to 1)
Probability (p) = #favourable outcomes/ #all poss outcomes
= #cases/ total pop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Rate of change of probability over time?

A

Rate of change of probability over time:
• Instantaneous rate = hazard rate
o Instantaneous probability of disease within next small interval of time, given that subject is still at risk of time t
• Average rate = incidence rate
o Expresses risk of disease development per unit in time relative to size of pop at risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is conditional probability?

A

= p(A|B) = (P(A and B))/(P(A))
= probability event A will happen given that some other event B has already occurred
p(A|B) is not the same as p(B|A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is sensitivity and what is the equation?

A

Sensitivity = testing +ve given that you have the disease

= Total Positive/ (Total Positive + False Negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is specificity and what is the equation?

A

Specificity = testing -ve given that you don’t have the disease
= Total Negative / (False Positive + Total Negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the positive predictive value and what is the equation?

A

+ve predictive value = TP/ TP + FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the negative predictive value and what is the equation?

A

-ve predictive value = TN/ FN + TN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a percentage?

A

(proportion) *100 (scale 0 to 100)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a rate?

A

the number of times something happens per a quantifier (x per 100 people) (scale 0 to infinity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are odds?

A

the number with the outcome/the number without the outcome (scale 0 to infinity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you work out a risk ratio? Also what is Number needed to?

A

Divide one probability/percentage by the other
Whichever group on top is the focus
When we divide two numbers together, there are 3 potential outcomes:
Risk in group A is lager than in group B (RR=>1)
Risk in group A is same as group B (RR=1
Risk in group A is smaller than in group B (RR=<1)

We interpret RR in relation to 1 as we are trying to show difference

RR = 1 – no association, no effect
RR > 1 - +ve association, risk factor
RR < 1 - -ve association, protective factor

Indicates potential benefit/ risk of a clinical intervention

NNT = 1/ RD ‘number of people needed to treat to prevent 1 outcome’
NNH = 1/RD ‘number of people needed to cause harm’ 

If RD is positive, intervention is harmful rather than protective – use NNH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you work out an odds ratio?

A

Divide the odds in one group by the other

Whichever group on top is focus

Odds in group A are lager than in group B (RR=>1)
Odds in group A are same as group B (RR=1
Odds in group A are smaller than in group B (RR=<1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is prevelance probability, what are its problems and why is it useful?

A

Prevalence probability = probability of having disease at given time point (right now)
Problems:
• Length-time bias: conditions with longer duration more likely to be captured
• Difficult to make meaningful comparisons of risks
Usefulness:
• Good for measuring disease burden on a pop to assess healthcare needs + allocate resources
• In etiologic research where outcomes do not have follow up data/ onset difficult to define

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is incidence proportion and what are the problems?

A

Incidence probability = probability of getting disease during specified time period (new cases in next 5 yrs)
Cohort (follow up) study
Problems:
Assumes fixed + complete follow up (competing risks – deaths to other causes, or losses to follow up)
Ignores time to event
Use incidence rates instead: #new cases during time period/ Σperson-time
Σperson-time = (size of pop at start) x (average duration of follow-up)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are descriptive objectives?

A

Measure frequency of disease in a population
Estimate disease burden (mortality, financial cost etc)
Measure response
Estimate frequency of known/ suspected risk factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are analytic objectives?

A

Specify risk factors + causes
Quantify the effect of exposures or treatments
Assess the relative effectiveness of interventions
‘how + why’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an experimental study?

A

recording of outcome following planned intervention in the care of patients
• Always analytical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an observational study?

A

recording of outcome without intervening in the care of the patient in any way, other than what is routine clinical care
• Can be analytical if have comparison group or descriptive if not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a confounding variable?

A

distortion that should be prevented or controlled

20
Q

What is effect modification?

A

interaction, useful information (ie new drug slow in elderly so ill for longer, age = effect modifier)

21
Q

What is the difference between association and causation?

A

Association refers to statistical link between exposure + disease
• May not reflect cause-and-effect relationship

22
Q

What is a cohort study? What are the advantages and disadvantages?

A
  • Prospective (exposure  outcome)
  • Take sample of population at risk of outcome, some of sample exposed then follow up and look at incidence of outcome
  • Expensive + time consuming, not suitable for rare disease (sample too small) or for diseases with long latent periods
23
Q

What is a cross-sectional study? What are the advantages and disadvantages?

A
  • No direction of inquiry ‘what is happening’ at current moment
  • Take sample of population, look at exposure + cases together
  • Not suitable for rare disease/ outcomes, very limited potential to establish disease aetiology due to confounding, selection bias (cases with diseases with long duration + more survivors more easily included)
24
Q

What is a case-control study? What are the advantages and disadvantages?

A
  • Retrospective (outcome  exposure)
  • Disease based sampling: 2 sample groups (cases with disease + controls without), backward follow up to look at exposure
  • Only single outcome/ disease can be studied, no valid estimates of risk of disease, limited potential to establish causal effects due to confounding, selection bias + measurement bias
25
Q

What is a randomised controlled trial? What are the advantages and disadvantages?

A
  • Prospective
  • Sample at risk population, give half treatment, other half = control, after follow up period look at incidence of outcome
  • Treatment group allocation must be random to prevent confounding problems
  • Ideally would have a double-blind trial – neither patient nor physician aware of treatment – prevents bias (differential follow-up or care otherwise)
  • Most convincing evidence of cause-and-effect relationships but expensive, complicated by non-compliance, not always ethical
26
Q

What is a 2x2 contigency table?

A

For any type of analytical study, data can be summarised in a 2X2 table with exposed and not exposed on the left and diseased and not diseased on the top.

RD = a/(a+c)- b/(b+d)
RR = (a × (b+d))/(b × (a+c))
OR = (a × d)/(b × c)
For case-control studies, only OR is meaningful

27
Q

What is non-probability sampling?

A

Non-probability samples:
• Convenience (ease of access), snowball (friend of friend), purposive (you choose who)
• Probability of being chosen is unknown
• Cheaper + easier to implement but unable to generalise, high potential for bias

28
Q

What is single random sampling?

A
  • Each member of population has equal probability of being selected
  • Use random mechanism (random number table)
  • Need complete list of population – time-consuming, does not always achieve best representation - by bad luck sample not evenly spread across all sections of population
29
Q

What is systematic random sampling?

A
  • Members of population selected at equal intervals

* Need complete list of population, less precise than SRS but easy to carry out

30
Q

What is stratified random sampling?

A
  • Population partitioned into strata and sample selected from within each stratum (by SRS) – should be low within-strata variability
  • Requires sampling frame, prior info about population being sampled, costly but increases representativeness and more statistically precise
31
Q

What is cluster sampling?

A
  • Population partitioned into clusters and sample selected (by SRS) – all people in selected clusters are included in sample, should be low between-groups variability
  • Don’t need complete list (just list of clusters), cheap but decreases statistical precision
32
Q

What is confidence?

A

Inferential statistics uses sample data to draw inferences about population represented

2 approaches: confidence intervals (estimate range of plausible values for population parameter) + tests of significance (assess degree of uncertainty in evidence using a p-value)

33
Q

What is variability?

A

Chance evidence is due to variability
• Different samples produce different estimates of effects
• Sampling error
2 types of variability:
• Within sample: measured using standard deviation (learn equation)
• Between samples: measured using standard error (learn equation)

34
Q

What is normal distribution?

A

‘Normal reference range’ in clinical practice: ± 2 SDs
Normal distribution not normally encountered – few biological variables conform, most medically important variables are not symmetrical

35
Q

What is the central limit theorum?

A

Had we repeatedly sampled a population + calculated the sample means, the sampling distribution of those means would be approx normal

36
Q

What is standard error?

A

SE is low when sample is large + variability in data is low

Lower SE leads to lower uncertainty of mean + better precision of mean

37
Q

What is a confidence interval?

A

More confidence –> wider interval –> less precision
95% CI: for every 100 studies performed, we expect 5 to produce an interval that does not contain the true population value, 95% of the time the CI would include the true population parameter

CI for difference: if interval excludes zero, effect is statistically significant
CI for ratio: if interval excludes 1, effect is statistically significant

38
Q

What is a null hypothesis?

A

statement being tested – statement of no difference, no association, no effect

39
Q

What is a p-value?

A

P-value is probability of an effect as large/ larger than observed effect in sample, assuming null hypothesis is true
• Small p-value: sample results unlikely when null hypothesis is true – data contradicts, reject null hypothesis – statistically significant effect
o <0.05 is often accepted as statistically significant
• Large p-value: sample results likely when null hypothesis is true – cannot reject null hypothesis – inconclusive

Statistical significance does not necessarily mean that effect is clinically significant – needs to be big enough to make worthwhile difference

40
Q

What is regression analysis?

A

Multivariable regression analysis = valuable tool for diagnostic, prognostic + aetiognositic research problems:
• Diagnosis: if disease is present
• Prognosis: future course of patient’s current standing + how this depends on choice of intervention
• Aetiology: factors that cause disease

Applications of Regression:
• Developing model for risk prediction of a clinical outcome
• Isolating effect of single variable on clinical outcome
• Identifying multiple important predictors of a clinical outcome and how they jointly affect outcome
• Covariate adjustment to improve efficiency in RCTs

Crude (unadjusted) does not take into account effect of confounding variables
Adjusted accounts for confounding variable – generated using multivariate regression analysis
• May still be residual confounding

Regression relates 2 kinds of variables:
•	Outcome (or response) variable 
o	Eg BP, 90 day mortality 
•	Explanatory variables (or predictors) 
o	Eg age, comorbid conditions
41
Q

Describe 4 types of regression models?

A

Model: Linear Regression
Outcome: Continuous
What is modelled?: Mean
Measure of effect: Mean Difference

Model: Logistic Regression
Outcome: Binary
What is modelled? : Log (odds)
Measure of effect: Odds Ratio

Model: Poisson Regression
Outcome: Binary (count data)
What is modelled?: Log (incidence rate)
Measure of effect: Incidence rate ratio (IRR)

Model: Cox Regression
Outcome: Time to event
What is modelled?: Log ( Hazard Rate)
Measure of effect: Hazard ratio (HR)

42
Q

What is simple linear regression?

A

Simple as only one ‘X’ variable (how does outcome Y change with X)
Look for correlation – linear relationship
/ = positive, \ = negative, — = no relationship
Mean y = a + b x
Mean y = outcome variable
A = intercept
B = slope (∆y/∆x)
X = predictor variable

43
Q

What is multi-variable/multiple linear regression?

A

Mean y = a + b1X1 + b2X2 + b3X3 ….

P-values included in table to show which predictors have no statistically significant effect on birth weight
• P > 0.05 = not significant

Standard errors can be used to calculate confidence interval for the b coefficients:
• b ± 1.96 SE
• b coefficients quantify effect of each predictor on__________, adjusting for all other predictors
o eg study looking at mean birth weight values: b = 9 for mother’s height: increase of 1 cm in mother’s height is expected to produce an average increase in birth weight of 9g
o -ve coefficient means a decrease

44
Q

What is chance?

A

= random error (imprecision) that produces different observations for replicate experiments or repeated samples
Controlling:
• Prevention: sufficiently large sample size
• Detection + evaluation: confidence intervals, p-values

45
Q

What is bias?

A

= any systematic error in the design/ conduct of research resulting in an estimated effect which is different from the truth, 2 types:
• Selection bias: participants (inc factors affecting recruitment/ retention of subjects)
• Information bias: information get from participants (eg due to improperly calibrated measuring device, recall or interviewer bias)
Controlling:
• Before study: appropriate selection of study participants + correct data measurements + definitions
• After study: assess how well this was done

Good precision: SE small, all results close together
Good accuracy: bias low, all results in expected range

46
Q

What is confounding?

A

= distortion in measure of effect as other variables (that will have an effect) are not controlled for
Confounding pulls observed (crude) effect away from the true effect – can over- or under-estimate the truth
Controlling
• Before study: randomise (comparable groups), restrict (eliminates confounders – ie only use one gender in study) + matching (control group that resembles case group with respect to confounders)
• After study: stratified analysis, regression modelling, adjusted effect measures
Criteria for confounding:
• Confounder must be associated with the outcome
• Confounder must be associated with the exposure
• Confounder cannot be an intervening variable between exposure + outcome

Effect modifier has different effects across strata (statistically significant differences) – belongs to nature so is useful

47
Q

What is the criteria for causal associations?

A

Strength of association
• How far from null value measure of effect is
Temporal sequence
• Did exposure precede outcome?
Consistency of effect
• Has the effect been seen by others? Is study reproducible in different settings?
Dose-response relation
• Does increased exposure result in greater effect?
Biological plausibility
• Does association make biological sense?
Experimental evidence
• Has an RCT been done?