Statistics Flashcards

1
Q

What is probability?

A

the number with outcome/the total number (scale 0 to 1)
Probability (p) = #favourable outcomes/ #all poss outcomes
= #cases/ total pop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Rate of change of probability over time?

A

Rate of change of probability over time:
• Instantaneous rate = hazard rate
o Instantaneous probability of disease within next small interval of time, given that subject is still at risk of time t
• Average rate = incidence rate
o Expresses risk of disease development per unit in time relative to size of pop at risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is conditional probability?

A

= p(A|B) = (P(A and B))/(P(A))
= probability event A will happen given that some other event B has already occurred
p(A|B) is not the same as p(B|A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is sensitivity and what is the equation?

A

Sensitivity = testing +ve given that you have the disease

= Total Positive/ (Total Positive + False Negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is specificity and what is the equation?

A

Specificity = testing -ve given that you don’t have the disease
= Total Negative / (False Positive + Total Negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the positive predictive value and what is the equation?

A

+ve predictive value = TP/ TP + FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the negative predictive value and what is the equation?

A

-ve predictive value = TN/ FN + TN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a percentage?

A

(proportion) *100 (scale 0 to 100)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a rate?

A

the number of times something happens per a quantifier (x per 100 people) (scale 0 to infinity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are odds?

A

the number with the outcome/the number without the outcome (scale 0 to infinity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you work out a risk ratio? Also what is Number needed to?

A

Divide one probability/percentage by the other
Whichever group on top is the focus
When we divide two numbers together, there are 3 potential outcomes:
Risk in group A is lager than in group B (RR=>1)
Risk in group A is same as group B (RR=1
Risk in group A is smaller than in group B (RR=<1)

We interpret RR in relation to 1 as we are trying to show difference

RR = 1 – no association, no effect
RR > 1 - +ve association, risk factor
RR < 1 - -ve association, protective factor

Indicates potential benefit/ risk of a clinical intervention

NNT = 1/ RD ‘number of people needed to treat to prevent 1 outcome’
NNH = 1/RD ‘number of people needed to cause harm’ 

If RD is positive, intervention is harmful rather than protective – use NNH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you work out an odds ratio?

A

Divide the odds in one group by the other

Whichever group on top is focus

Odds in group A are lager than in group B (RR=>1)
Odds in group A are same as group B (RR=1
Odds in group A are smaller than in group B (RR=<1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is prevelance probability, what are its problems and why is it useful?

A

Prevalence probability = probability of having disease at given time point (right now)
Problems:
• Length-time bias: conditions with longer duration more likely to be captured
• Difficult to make meaningful comparisons of risks
Usefulness:
• Good for measuring disease burden on a pop to assess healthcare needs + allocate resources
• In etiologic research where outcomes do not have follow up data/ onset difficult to define

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is incidence proportion and what are the problems?

A

Incidence probability = probability of getting disease during specified time period (new cases in next 5 yrs)
Cohort (follow up) study
Problems:
Assumes fixed + complete follow up (competing risks – deaths to other causes, or losses to follow up)
Ignores time to event
Use incidence rates instead: #new cases during time period/ Σperson-time
Σperson-time = (size of pop at start) x (average duration of follow-up)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are descriptive objectives?

A

Measure frequency of disease in a population
Estimate disease burden (mortality, financial cost etc)
Measure response
Estimate frequency of known/ suspected risk factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are analytic objectives?

A

Specify risk factors + causes
Quantify the effect of exposures or treatments
Assess the relative effectiveness of interventions
‘how + why’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an experimental study?

A

recording of outcome following planned intervention in the care of patients
• Always analytical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an observational study?

A

recording of outcome without intervening in the care of the patient in any way, other than what is routine clinical care
• Can be analytical if have comparison group or descriptive if not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a confounding variable?

A

distortion that should be prevented or controlled

20
Q

What is effect modification?

A

interaction, useful information (ie new drug slow in elderly so ill for longer, age = effect modifier)

21
Q

What is the difference between association and causation?

A

Association refers to statistical link between exposure + disease
• May not reflect cause-and-effect relationship

22
Q

What is a cohort study? What are the advantages and disadvantages?

A
  • Prospective (exposure  outcome)
  • Take sample of population at risk of outcome, some of sample exposed then follow up and look at incidence of outcome
  • Expensive + time consuming, not suitable for rare disease (sample too small) or for diseases with long latent periods
23
Q

What is a cross-sectional study? What are the advantages and disadvantages?

A
  • No direction of inquiry ‘what is happening’ at current moment
  • Take sample of population, look at exposure + cases together
  • Not suitable for rare disease/ outcomes, very limited potential to establish disease aetiology due to confounding, selection bias (cases with diseases with long duration + more survivors more easily included)
24
Q

What is a case-control study? What are the advantages and disadvantages?

A
  • Retrospective (outcome  exposure)
  • Disease based sampling: 2 sample groups (cases with disease + controls without), backward follow up to look at exposure
  • Only single outcome/ disease can be studied, no valid estimates of risk of disease, limited potential to establish causal effects due to confounding, selection bias + measurement bias
25
What is a randomised controlled trial? What are the advantages and disadvantages?
* Prospective * Sample at risk population, give half treatment, other half = control, after follow up period look at incidence of outcome * Treatment group allocation must be random to prevent confounding problems * Ideally would have a double-blind trial – neither patient nor physician aware of treatment – prevents bias (differential follow-up or care otherwise) * Most convincing evidence of cause-and-effect relationships but expensive, complicated by non-compliance, not always ethical
26
What is a 2x2 contigency table?
For any type of analytical study, data can be summarised in a 2X2 table with exposed and not exposed on the left and diseased and not diseased on the top. RD = a/(a+c)- b/(b+d) RR = (a × (b+d))/(b × (a+c)) OR = (a × d)/(b × c) For case-control studies, only OR is meaningful
27
What is non-probability sampling?
Non-probability samples: • Convenience (ease of access), snowball (friend of friend), purposive (you choose who) • Probability of being chosen is unknown • Cheaper + easier to implement but unable to generalise, high potential for bias
28
What is single random sampling?
* Each member of population has equal probability of being selected * Use random mechanism (random number table) * Need complete list of population – time-consuming, does not always achieve best representation - by bad luck sample not evenly spread across all sections of population
29
What is systematic random sampling?
* Members of population selected at equal intervals | * Need complete list of population, less precise than SRS but easy to carry out
30
What is stratified random sampling?
* Population partitioned into strata and sample selected from within each stratum (by SRS) – should be low within-strata variability * Requires sampling frame, prior info about population being sampled, costly but increases representativeness and more statistically precise
31
What is cluster sampling?
* Population partitioned into clusters and sample selected (by SRS) – all people in selected clusters are included in sample, should be low between-groups variability * Don’t need complete list (just list of clusters), cheap but decreases statistical precision
32
What is confidence?
Inferential statistics uses sample data to draw inferences about population represented 2 approaches: confidence intervals (estimate range of plausible values for population parameter) + tests of significance (assess degree of uncertainty in evidence using a p-value)
33
What is variability?
Chance evidence is due to variability • Different samples produce different estimates of effects • Sampling error 2 types of variability: • Within sample: measured using standard deviation (learn equation) • Between samples: measured using standard error (learn equation)
34
What is normal distribution?
‘Normal reference range’ in clinical practice: ± 2 SDs Normal distribution not normally encountered – few biological variables conform, most medically important variables are not symmetrical
35
What is the central limit theorum?
Had we repeatedly sampled a population + calculated the sample means, the sampling distribution of those means would be approx normal
36
What is standard error?
SE is low when sample is large + variability in data is low | Lower SE leads to lower uncertainty of mean + better precision of mean
37
What is a confidence interval?
More confidence --> wider interval --> less precision 95% CI: for every 100 studies performed, we expect 5 to produce an interval that does not contain the true population value, 95% of the time the CI would include the true population parameter CI for difference: if interval excludes zero, effect is statistically significant CI for ratio: if interval excludes 1, effect is statistically significant
38
What is a null hypothesis?
statement being tested – statement of no difference, no association, no effect
39
What is a p-value?
P-value is probability of an effect as large/ larger than observed effect in sample, assuming null hypothesis is true • Small p-value: sample results unlikely when null hypothesis is true – data contradicts, reject null hypothesis – statistically significant effect o <0.05 is often accepted as statistically significant • Large p-value: sample results likely when null hypothesis is true – cannot reject null hypothesis – inconclusive Statistical significance does not necessarily mean that effect is clinically significant – needs to be big enough to make worthwhile difference
40
What is regression analysis?
Multivariable regression analysis = valuable tool for diagnostic, prognostic + aetiognositic research problems: • Diagnosis: if disease is present • Prognosis: future course of patient’s current standing + how this depends on choice of intervention • Aetiology: factors that cause disease Applications of Regression: • Developing model for risk prediction of a clinical outcome • Isolating effect of single variable on clinical outcome • Identifying multiple important predictors of a clinical outcome and how they jointly affect outcome • Covariate adjustment to improve efficiency in RCTs Crude (unadjusted) does not take into account effect of confounding variables Adjusted accounts for confounding variable – generated using multivariate regression analysis • May still be residual confounding ``` Regression relates 2 kinds of variables: • Outcome (or response) variable o Eg BP, 90 day mortality • Explanatory variables (or predictors) o Eg age, comorbid conditions ```
41
Describe 4 types of regression models?
Model: Linear Regression Outcome: Continuous What is modelled?: Mean Measure of effect: Mean Difference Model: Logistic Regression Outcome: Binary What is modelled? : Log (odds) Measure of effect: Odds Ratio Model: Poisson Regression Outcome: Binary (count data) What is modelled?: Log (incidence rate) Measure of effect: Incidence rate ratio (IRR) Model: Cox Regression Outcome: Time to event What is modelled?: Log ( Hazard Rate) Measure of effect: Hazard ratio (HR)
42
What is simple linear regression?
Simple as only one ‘X’ variable (how does outcome Y change with X) Look for correlation – linear relationship / = positive, \ = negative, --- = no relationship Mean y = a + b x Mean y = outcome variable A = intercept B = slope (∆y/∆x) X = predictor variable
43
What is multi-variable/multiple linear regression?
Mean y = a + b1X1 + b2X2 + b3X3 …. P-values included in table to show which predictors have no statistically significant effect on birth weight • P > 0.05 = not significant Standard errors can be used to calculate confidence interval for the b coefficients: • b ± 1.96 SE • b coefficients quantify effect of each predictor on__________, adjusting for all other predictors o eg study looking at mean birth weight values: b = 9 for mother’s height: increase of 1 cm in mother’s height is expected to produce an average increase in birth weight of 9g o -ve coefficient means a decrease
44
What is chance?
= random error (imprecision) that produces different observations for replicate experiments or repeated samples Controlling: • Prevention: sufficiently large sample size • Detection + evaluation: confidence intervals, p-values
45
What is bias?
= any systematic error in the design/ conduct of research resulting in an estimated effect which is different from the truth, 2 types: • Selection bias: participants (inc factors affecting recruitment/ retention of subjects) • Information bias: information get from participants (eg due to improperly calibrated measuring device, recall or interviewer bias) Controlling: • Before study: appropriate selection of study participants + correct data measurements + definitions • After study: assess how well this was done Good precision: SE small, all results close together Good accuracy: bias low, all results in expected range
46
What is confounding?
= distortion in measure of effect as other variables (that will have an effect) are not controlled for Confounding pulls observed (crude) effect away from the true effect – can over- or under-estimate the truth Controlling • Before study: randomise (comparable groups), restrict (eliminates confounders – ie only use one gender in study) + matching (control group that resembles case group with respect to confounders) • After study: stratified analysis, regression modelling, adjusted effect measures Criteria for confounding: • Confounder must be associated with the outcome • Confounder must be associated with the exposure • Confounder cannot be an intervening variable between exposure + outcome Effect modifier has different effects across strata (statistically significant differences) – belongs to nature so is useful
47
What is the criteria for causal associations?
Strength of association • How far from null value measure of effect is Temporal sequence • Did exposure precede outcome? Consistency of effect • Has the effect been seen by others? Is study reproducible in different settings? Dose-response relation • Does increased exposure result in greater effect? Biological plausibility • Does association make biological sense? Experimental evidence • Has an RCT been done?