STATISTICS Flashcards

1
Q

What type of distribution occurs if there is a single mode? What if there are more than 2 modes?

A

Single mode distribution: unimodal
> 1 mode distribution: bimodal/multi-modal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is normal distribution?

A

Gaussian distribution

  1. Mean = Median = Mode
  2. Symmmetrical bell shaped curve
  3. Characterised by mean and standard deviation –> fltter wide curves have greater standard deviation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the standard normal distribution?

A

Also called Z distribution

Special case of normal distribution where mean is zero and SD is 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is negative skew?
What is the order of mean median and mode?

A

Abnormal distribution of data where it falls heavily on the left side –> Draw line down from peak and notice more data falls on the left side

Order: Mode, Median and Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is positive skew?
What is the order of mean median and mode?

A

Abnormal distribution of data where it falls heavily on the right side –> Draw line down from peak and notice more data falls on the right/more positive side

Order: Mean , Median and Mode (from right to left )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which number is a better represenstative of central tendency in skewed distribution?

A

Median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is binomial distribution?

A

Same as normal but is Discrete Probability distribution from a fixed number of possible outcomes rather than continuous:

  1. Only TWO mutually exclusive (independent) outcomes, such as success or failure or boy/girl.
  2. Probability of event or outcome is same for all trials.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is poisson distribution?

A

Discrete probability distribution that models the number of events occuring within a fixed interval time or space - usually rare events. E.g. number of deaths in a town from a particular disease per day.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between T distribution and chi-squared distribution?

A

T distribution - symmetrical distribution and becomes more like normal distribution when sample size increases. Used in null hypothesis when you want to figure out to accept or reject it.

Characterised by degrees of freedom. USED IN ANALYSING NUMERICAL DATA.

Chi-squared distribution - right-skewed distribution, it becomes more symmetrical and approaches normality as degrees of freedom increased.

USED IN ANALYSING CATEGORICAL DATA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is central limiting theorem?

A

At how many data values would it require to use normal distribution for statistical analysis –> usually at 10.

Binomial distribution: When both variables are greater than 5 (eg. success > 5 and failures > 5)

Poisson distribution: When the number of events exceeds 10.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you calculate

  1. Variance
  2. Coefficient of variation
  3. Standard Deviation
A
  1. Variance = SD^2
  2. COV = SD/mean
  3. SD = square root of variance

Variance - measures spread of observations around mean

COV - relative variability, between populations

SD = measures spread of sample distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How much % of obserivations occur within:
1 SD
2 SD
3 SD

A

1 SD = 68%
2 SD = 95%
3 SD = 99%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Z score (Standard score)? How is it calculated?

A

Comparing one specific value, how far is that value from the Standard deviation

Positive Z score - above average
Negative Z score - below average
0 Z score - exactly the average

Calculated by:

Z = (the number - the mean) / SD

Example:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you calculated standard error of the mean?

What is it? How do you interpret it? Difference between SEM and SD?

A

SEM = SD / square root (sample size)

SEM is always smaller than SD.

SEM measures PRECISION ie. how close the sample’s average is to the true average of the whole population - how confident are we that samples average represents real-world average.

SD purely measures how much individuals in the group vary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the confidence interval? How do you calculate:
1. 90% CI
2. 95% CI
3. 99% CI

A

CI = interval / range of what is the true value within a given population with known probability.

90% CI = Mean +/- 1.64 SEM
95% CI = Mean +/- 1.96 SEM
99% CI = Mean +/- 2.58 SEM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a type I error?
How do you check probability of making a type I error?

A

FALSE POSITIVE ie incorrectly concluding that there is a difference ie

  1. incorrect rejection of the true null hypothesis (false positive)
  2. incorrectly accepting the alternative hypothesis

Check probability by checking p value (if p value < 0.05 = 2SD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a type 2 error? How do you check probability of making Type 2 error?

A

FALSE NEGATIVE

Ie incorrectly concluding there is no difference

  1. Incorrectly accepting null hypothesis
  2. Incorrectly rejecting alternatve hypothesis

Check probability of type 2 error:
Check POWER of the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is power of the study? How do you calculate power?

A

Power of study - probability of not committing a type 2 error -

  1. probabiity of rejecting the null hypothesis when it is truly false.
  2. probability of rejecting the null hypothesis when the alternative hypothesis is true.
  3. Ability to detect a significant difference when a significant difference is present.

TYPE 2 error = 1 - power

Power = 1 - type 2 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What factors can increase the power of a study? (4)

A
  1. High Sample size
  2. Reduced variability of observations/smaller population variance
  3. Greater significance level
  4. Greater effect of interest/size of difference tested (ie you expect a drug to lower BP by 10mmHg instead of 1mmHg).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is prevalence? How is it calculated? (3)

When is prevalence useful and how can you decrease disease prevalence?

A

Proportion of people with a disease at any point in time - for cross sectional studies.

Prevalence = total cases / population
Prevalance = (TP + FN) / (TP + TN + FP + FN)
Prevalence = incidence x duration

Prevalence useful to measure chronic diseases.

Secondary prevention efforts decease disease prevalence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is incidence? How do you calculate it?

When is it useful to measure incidence? How to reduce incidence?

A

Number of new cases of the disease in a population over a defined period of time.

Incidence = new cases during time X / population at risk.

Primary prevention results in reduced incidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the positive predictive value?

How is it calculated?

A

Measures a test: Proportion of people with a true positive test.

PPV = True positive / all people with positive test result.

PPV = TP / TP + FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is negative predictive value? How is it calculated

A

Measures a test: Proportion of people with a true negative test

NPV = TN / TN + FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does prevalence impact? What does prevalence not impact?

A

Prevalence impacts positive predictive value and negative predictive value.

High prevalence = High PPV, Low NPV
Low prevalence = Low PPV, High NPV

Prevalence does not impact
1. Sensitivity
2. Specificity
3. Likelihood ratios

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the likelihood ratio? What is it used for?

A

Likelihood ratios help doctors understand how much a test result changes the chance a person has or doesn’t have the disease.

It converts pre-test probability to post-test probability.

If LR would not change treatment, do not order the test.

LR is a clinical application of Bayes theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a positive LR (LR+) and how do you calculate it? What is the ideal number?

A

Tells us how much more likely someone with a positive test result actually has the disease.

Higher LR+ = Better test.

Example: If a test has LR+ of 12, it means a positive test result makes the disease 12 times more likely.

Ideal: LR > 10.

Calculation
LR+ = sensitivity / 1 - specificity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a negative LR (LR-) and how do you calculate it? What is the ideal number?

A

Tells us how much more likely someone with a negative test result doesn’t have the disease.

Lower LR- = Better test.

Example: If a test has LR- of 0.05, it means a negative test result reduces the chance of having the disease by 95%.

Calculation:
LR- = 1 - sensitivity / specificity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the difference between likelihood ratios and PPV/NPV?

A

**Likelihood ratios tell you shift in probability.
**
LR+ = If you get + test, your probability shifts up by X.
LR- = If you get - test, your probability shifts down by X.

**Predictive values tell you current probability.
**PPV = If you get + test, your probability of having disease is Y.
NPV = If you get - test, your probability of not having disease is Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do you measure accuracy of a test?

A

TP + TN / All Screened people

TP + TN / TP + TN + FP +FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is number needed to treat? And how do you calculate it? (2)

A

Means how many people in the general population need to be treated to prevent one case.

  1. Calculation: Inverse of the incidence rate

If incidence = 16/1000 people
NNT = 1000/16 = 625.

Need 625 general population people to prevent one case.

  1. Calculation: 1 / absolute risk reduction
    Number of treated people required to prevent one adverse outcome.
    Eg.
    60% of patients responded to treatment X
    15% of patients responded to placebo

NNT = 1 / (0.6-0.15) = 2.2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Fill in table.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is sensitivity? How do you calculate sensitivity?

A

True positive rate = proportion of people with the disease who are correctly classified by screening test as positive.

True positives / all those with pathology

True positives / True positives + false negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is specificity? How do you calculate specificity?

A

True negative rate: The proportion of well people who are correctly classified by the screening test as negative.

True negatives / all those without pathology

True negative / True negative + False positive

34
Q

What does it mean when a test has high specificity? What does it mean when a test has high sensitivity?

A

SP-IN
SN-OUT

High specificity, positive result rules IN the disease

High sensitivity, negative result rules OUT the disease.

35
Q

Remember this graph.

A

YOU ARE ONLY INTERESTED IN THE MIDDLE PORTION.

36
Q

What is absolute risk?

A

Number of cases of disease in exposed / number of individuals exposed.

37
Q

What is relative risk? Which studies is it used for? How do you calculate it?

A

If you have the exposure (risk factor) The risk that you will develop the disease.

Used in COHORT studies.

Remember calculation : 1 number / 2 numbers in NUMERATOR, 1 number / 2 numbers in DENOMINATOR.

38
Q

How do you interpret the below
RR = relative risk reduction

  1. RR = 1
  2. RR > 1
  3. RR < 1
  4. RR = 2
  5. RR = 0.06
A
  1. RR = 1 - no association
  2. RR > 1 - positive association (exposure increases disease risk)
  3. RR < 1 - negative association (exposure decreases disease risk - protective)
  4. RR 2 - twice as likely to get disease
  5. RR 0.6 - 40% likely to get the disease (protective factor)
39
Q

What is the relative risk reduction? How is it calculated?

A

Alternative way of measuring RR : the population or % of baseline risk which was reduced by a given intervention

RRR = (1 - RR) x 100

eg.
The RR for effect of aspirin versus no aspirin on vascular death = 0.80
RRR = 1 - 0.80 x 100% = 20%
So aspirin reduced the risk of death by 20%.

40
Q

What is the absolute risk reduction (ARR) ? WHich calculation is it used in

A

Measures amount of risk that can be attributed to a particular factor.

ARR = incidence rate among exposed - incident rate among non-exposed

a / a+b - c / c+d

NNT = 100/ARR.

41
Q

What is the odds ratio? How is it calculated (2)

A

For CASE-CONTROL STUDIES –> you know the outcome.

The odds that you were exposed if you have the disease.

  1. Odds of exposure among cases / odds of exposure amongst controls

(A/B) / (C/D) - ONLY ONE NUMBER IN NUMERATOR AND DENOMENATOR.
Also calulated by:

  1. Prevalence / (1 - prevalence)
42
Q

How would you interpret the Odds Ratio in these scenarios (OR)

OR = 1
OR > 1
OR < 1

A

OR = 1 : no association
OR > 1: Exposure is risk factor for disease
OR < 1 : Exposure is protective for disease

43
Q

What is the difference between odds ratio (OR) and relative risk (RR)

A

Relative Risk (superior) : Used in cohort studies
“You have the exposure? Here is your risk of getting disease”

Odds Ratio: Used in CASE-CONTROL studies
“You have the disease? Here are the odds you were exposed”

44
Q

What is the difference between one-tailed T test and two-tailed T test. What happens to the p value in both?

A

A one-tailed (-sided) test is only concerned with differences between observations in one direction (Ex. whether drug A is better than a placebo.)
The p value for a one-tailed test is generally half that for a two-sided test.

Two-tailed Test
two-tailed (-sided) test is concerned with differences between observations in either direction
Ex. two alternative treatments, A and B are compared, where either A or B may be better
The majority of clinical trials perform two-tailed tests

45
Q

What is the difference between mutually exclusive and non-mutually exclusive?

A

Mutually exclusive means that the occurrence of one event precludes the occurrence of the other (i.e., cannot both happen).
eg. If a coin lands heads, it cannot be tails; the two are mutually exclusive.

If two events are not mutually exclusive, the combination of probabilities is accomplished by adding the two together and subtracting out the multiplied probabilities

For example, if the chance of having diabetes is 10%, and the chance of someone being
obese is 30%, the chance of meeting someone who is obese or has diabetes is 0.1 + 0.30
− (0.1 × 0.30) = 0.37 (or 37%).

46
Q

What is the difference between efficacy, effectiveness and efficiency?

A

Efficacy = measure of effect under ideal or laboratory conditions

Effectiveness = effect under real life condition
Efficacy does not imply effectiveness

Efficiency = relationship between costs and benefits
Effectiveness does not imply efficiency

47
Q

What is the difference between
1. Cost Minimisation analysis (CMA)
2. Cost Effectiveness analysis (CEA)
3. Cost Utility analysis (CUA)
4. Cost Benefit Analysis (CBA)

A
48
Q

What is the Receiver Operating Characteristic Curve (ROC)

A
49
Q

Name types of observational studies? (6)

A
  1. Surveys/Cross-sectional studies.
  2. Case series
  3. Case-control studies
  4. Cohort (prospective) studies
  5. Geographical ecological studies
  6. QUalitative studies
50
Q

Name types of experimental studies?

A
  1. RCTs
  2. Cluster RCTs
  3. Cross-over trials
  4. Factorial trials
51
Q

Name the hierarchy of evidence (7).

A
  1. Systematic review/meta-analysis
  2. RCTs
  3. Cohort studies
  4. Case-control studies
  5. Case report/Case series
  6. Expert Opinion
  7. Anaimal studies.
52
Q

What is the difference between case control study and cohort study? When are they more useful.

A

Case control: cannot assess incidence or prevalence, measures odds ratio. More useful in RARE outcomes and COMMON exposures (starting point is rare, what you are measuring is common)

Cohort: Can determine incidence and causal relationships, measured relative risk. More useful in COMMON outcomes and RARE exposures (starting point is rare,what you are measuring is common).

53
Q

What are the advantages and disadvantages of case control study?

A
54
Q

Temporality of different study designs.

A
55
Q

Difference table of cross sectional, case control and cohort.

Fill in the blanks.

A
56
Q

What is a crossover study? WHen is it done (2)

WHen is it not done? (2)

A

Variation of RCT, determines causation.

Done:
1. Patients with chronic, stable disease
2. Drugs with short-term effects

Not done:
1. Self-limiting illnesses
2. Drugs with long-term effects
Drugs with long-lasting effects are difficult to study using this method.

57
Q

Flowchart of study

A
58
Q

What are the different types of selection bias?

A
  1. Sampling bias - trial patients differ from clinical patients
  2. Berkson bias - only sickest are enrolled
    3
  3. Non-response bias - enrollment procedure different between two groups, allows subjects to decide whether or not to participate in the study
  4. Attrition bias - disproportionate drop out difference between groups
59
Q

What are the different types of researcher bias (2)

A
  1. Observer-expectancy bias - investigator’s prior knowledge influences data input
  2. Procedure bias - participants treated differently
60
Q

What are the participant causes of bias? (3) how do you reduce

A
  1. Recall bias - patients with disease are more likely to recall past exposure rather than current, reduce by decreasing time between time of diagnosis and time of study.
  2. Reporting bias - patients under or over report their experiences
  3. Hawthorne effect - subjects change their behaviour upon learning they are being studied - reduced with blinding.
61
Q

What is confounding bias? (2)

A

Uncontrolled variable caused a difference to be seen when there is no difference - can be reduced with matching –> selecting controls that differ from study subjects ONLY in the variable under investigation

Confounding bias and effect modification - after stratifying it produces a less drastic bias effect, it is effect modification.

62
Q

What is the difference between lead time and length time bias?

A

Lead time: mistakenly believing a screening test increases chance of survival by catching disease earlier –> both assume increased survival

Length time: Slowly progressive diseases are caught more than rapidly progressive variations of disease –> assume increased survival

Lead time bias - diagnosed disease earlier.
Length time bias - diagnose more benign and fewer aggresive variations

63
Q

What are confounders?

How can it be reduced at design stage of study?

How can it be reduced at analysis stage of study?

A

Confounders - a third variable which biases the measure of association we calculate for a particular exposure/outcome pair.

Design stage: Randomisation in RCT, matching in case-control study.

Analysis stage: Stratification, Standardisation, multiple regression

64
Q

Which method is best measure of dispersion in normal distrubuted dataset?

Which method is best measure of dispersion in skewed dataset?

A
  1. Normal - SD
  2. Skewed - interquartile range (SD will over-estimate spread)
65
Q

Whats the difference between cohort and case-control studies?

A
66
Q

What are the aims and number of subjects in the following:
Phase 0
Phase I
Phase II
Phase III
Phase IV

A
67
Q

“The median and mode are equal in a parametric dataset”. True or False?

A

True.

68
Q

What are the definitions of
1. Intra-rater reliability
2. Inter-rater reliability
3. Validity
4. Reproducibility

A
  1. Intra-rater reliability: degree of consistency between researchers assessing the making a measurement or observing a result.
  2. Inter-rater reliability: degree of consistency when the same individual does the same measurement or observation of a result
  3. Validity: the extent to which a test accurately measures what it aims to measure.

Reproducibility: extent to which we will get the same results if we were to take the same data and reanalysed it using the same methods.

69
Q

What is the correlation co-efficient? What does it range from?

A

The correlation coefficient ranges from -1 to +1.

A positive correlation means that two variables move in the same direction.

A negative correlation means that two variables move in opposite directions. r=0 means there is no correlation.

The close R is to 1 or -1, the tighter the correlation.

70
Q

What is the difference between paired t-test and unpaired t-test?

A

Paired t-test: Compare the same person’s results (before vs. after).

Example: Blood pressure before and after taking a drug.

“Tests the null hypothesis is that the mean of a set of differences of paired observations is equal to zero”.

Unpaired t-test: Compare two different groups’ results.

Example: Blood pressure in drug group vs. placebo group.

“Tests the null hypothesis that two means from independent groups are equal.

71
Q

What is the chi-squared test?

When would you use chi-squared test vs t-test vs ANOVA test vs Fisher’s Exact.

A

Tests null hypothesis that there is no association between factors that define a contigency table - used on frequency data and to test differences in proportion

0 = no difference between frequencies
>1 = greater the differences and less likely the null hypothesis is true.

72
Q

What’s the difference between Wald’s test and McNemar’s test.

A

Wald’s test - used in regression models to see if a variable matters (eg smoking affects heart disease)

McNemar’s test - paired categorical data, especially when comparing before and after outcomes.
2 paired groups + 2 categories

73
Q

What is Pearson Correlation Coeffcient?

A

Pearson’s correlation checks how two things are connected using a number called r - the strength of linear relationship between two continuous variables

Positive r means they go up together, negative r means one goes up as the other goes down, and 0 means no connection.

It’s great for exploring relationships in data, allows for corelation but does not indicated causation.

Usually demonstrated in scatter plots.

74
Q

Explain the differences between these non-parametric tests:

  1. Wilcoxon Signed Rank
  2. Wilcoxon Rank Sum
  3. Mann-Whitney U Test
  4. Spearman’s Rank.

What is their purpose, type of data used, equivalent parametric test, Advantages and Disadvantages

A

See summary table.

75
Q

Which clinical trial phase usually has the highest sucess rate?

A

Phase I

76
Q

What is the correlation coefficient?
What is its range?
What is it independent of?

A

Correlation coeffcient: SD/mean
Range: -1 to +1
Independent of (1) Level of significance of study (2) Magnitude of observations

77
Q

Comparing prevalence between two groups: Which test is most ideal?

A

Chi squared test.

78
Q

What is the best way to reduce chance of Type 1 error after multiple testing?

A

Apply Bonferroni correction.

79
Q

For a condition that is easily treated, which is more crucial, sensitivity or specificity?

A

SENSITIVITY (higher) is more crucial than specificity

This is because missing a case is harmful even if it means more FALSE POSITIVES.

80
Q

Sensitivity vs Specificity table.

A
81
Q

What is attributable risk?
What is absolute risk?
What is relative risk?

A

Attributating risk = disease INCIDENCE in EXPOSED - DISEASE incidence in non exposed

Absolute risk = number of cases of disease in exposed/number of individuals exposed

Relative risk = disease incidence in exposed/disease incidence in non-exposed.

82
Q

What’s the difference between multiple linear regression and binary logistic regression?

A

Multiple independent predictors + one dependant variable = multiple linear regression

Multiple independent predictors + binary dependant variable (i.e. yes/no) = Binary logistic regression