STATISTICS Flashcards
What type of distribution occurs if there is a single mode? What if there are more than 2 modes?
Single mode distribution: unimodal
> 1 mode distribution: bimodal/multi-modal
What is normal distribution?
Gaussian distribution
- Mean = Median = Mode
- Symmmetrical bell shaped curve
- Characterised by mean and standard deviation –> fltter wide curves have greater standard deviation.
What is the standard normal distribution?
Also called Z distribution
Special case of normal distribution where mean is zero and SD is 1.
What is negative skew?
What is the order of mean median and mode?
Abnormal distribution of data where it falls heavily on the left side –> Draw line down from peak and notice more data falls on the left side
Order: Mode, Median and Mean
What is positive skew?
What is the order of mean median and mode?
Abnormal distribution of data where it falls heavily on the right side –> Draw line down from peak and notice more data falls on the right/more positive side
Order: Mean , Median and Mode (from right to left )
Which number is a better represenstative of central tendency in skewed distribution?
Median.
What is binomial distribution?
Same as normal but is Discrete Probability distribution from a fixed number of possible outcomes rather than continuous:
- Only TWO mutually exclusive (independent) outcomes, such as success or failure or boy/girl.
- Probability of event or outcome is same for all trials.
What is poisson distribution?
Discrete probability distribution that models the number of events occuring within a fixed interval time or space - usually rare events. E.g. number of deaths in a town from a particular disease per day.
What is the difference between T distribution and chi-squared distribution?
T distribution - symmetrical distribution and becomes more like normal distribution when sample size increases. Used in null hypothesis when you want to figure out to accept or reject it.
Characterised by degrees of freedom. USED IN ANALYSING NUMERICAL DATA.
Chi-squared distribution - right-skewed distribution, it becomes more symmetrical and approaches normality as degrees of freedom increased.
USED IN ANALYSING CATEGORICAL DATA.
What is central limiting theorem?
At how many data values would it require to use normal distribution for statistical analysis –> usually at 10.
Binomial distribution: When both variables are greater than 5 (eg. success > 5 and failures > 5)
Poisson distribution: When the number of events exceeds 10.
How do you calculate
- Variance
- Coefficient of variation
- Standard Deviation
- Variance = SD^2
- COV = SD/mean
- SD = square root of variance
Variance - measures spread of observations around mean
COV - relative variability, between populations
SD = measures spread of sample distribution
How much % of obserivations occur within:
1 SD
2 SD
3 SD
1 SD = 68%
2 SD = 95%
3 SD = 99%
What is the Z score (Standard score)? How is it calculated?
Comparing one specific value, how far is that value from the Standard deviation
Positive Z score - above average
Negative Z score - below average
0 Z score - exactly the average
Calculated by:
Z = (the number - the mean) / SD
Example:
How do you calculated standard error of the mean?
What is it? How do you interpret it? Difference between SEM and SD?
SEM = SD / square root (sample size)
SEM is always smaller than SD.
SEM measures PRECISION ie. how close the sample’s average is to the true average of the whole population - how confident are we that samples average represents real-world average.
SD purely measures how much individuals in the group vary.
What is the confidence interval? How do you calculate:
1. 90% CI
2. 95% CI
3. 99% CI
CI = interval / range of what is the true value within a given population with known probability.
90% CI = Mean +/- 1.64 SEM
95% CI = Mean +/- 1.96 SEM
99% CI = Mean +/- 2.58 SEM.
What is a type I error?
How do you check probability of making a type I error?
FALSE POSITIVE ie incorrectly concluding that there is a difference ie
- incorrect rejection of the true null hypothesis (false positive)
- incorrectly accepting the alternative hypothesis
Check probability by checking p value (if p value < 0.05 = 2SD)
What is a type 2 error? How do you check probability of making Type 2 error?
FALSE NEGATIVE
Ie incorrectly concluding there is no difference
- Incorrectly accepting null hypothesis
- Incorrectly rejecting alternatve hypothesis
Check probability of type 2 error:
Check POWER of the study
What is power of the study? How do you calculate power?
Power of study - probability of not committing a type 2 error -
- probabiity of rejecting the null hypothesis when it is truly false.
- probability of rejecting the null hypothesis when the alternative hypothesis is true.
- Ability to detect a significant difference when a significant difference is present.
TYPE 2 error = 1 - power
Power = 1 - type 2 error
What factors can increase the power of a study? (4)
- High Sample size
- Reduced variability of observations/smaller population variance
- Greater significance level
- Greater effect of interest/size of difference tested (ie you expect a drug to lower BP by 10mmHg instead of 1mmHg).
What is prevalence? How is it calculated? (3)
When is prevalence useful and how can you decrease disease prevalence?
Proportion of people with a disease at any point in time - for cross sectional studies.
Prevalence = total cases / population
Prevalance = (TP + FN) / (TP + TN + FP + FN)
Prevalence = incidence x duration
Prevalence useful to measure chronic diseases.
Secondary prevention efforts decease disease prevalence.
What is incidence? How do you calculate it?
When is it useful to measure incidence? How to reduce incidence?
Number of new cases of the disease in a population over a defined period of time.
Incidence = new cases during time X / population at risk.
Primary prevention results in reduced incidence.
What is the positive predictive value?
How is it calculated?
Measures a test: Proportion of people with a true positive test.
PPV = True positive / all people with positive test result.
PPV = TP / TP + FP
What is negative predictive value? How is it calculated
Measures a test: Proportion of people with a true negative test
NPV = TN / TN + FN
What does prevalence impact? What does prevalence not impact?
Prevalence impacts positive predictive value and negative predictive value.
High prevalence = High PPV, Low NPV
Low prevalence = Low PPV, High NPV
Prevalence does not impact
1. Sensitivity
2. Specificity
3. Likelihood ratios
What is the likelihood ratio? What is it used for?
Likelihood ratios help doctors understand how much a test result changes the chance a person has or doesn’t have the disease.
It converts pre-test probability to post-test probability.
If LR would not change treatment, do not order the test.
LR is a clinical application of Bayes theorem
What is a positive LR (LR+) and how do you calculate it? What is the ideal number?
Tells us how much more likely someone with a positive test result actually has the disease.
Higher LR+ = Better test.
Example: If a test has LR+ of 12, it means a positive test result makes the disease 12 times more likely.
Ideal: LR > 10.
Calculation
LR+ = sensitivity / 1 - specificity
What is a negative LR (LR-) and how do you calculate it? What is the ideal number?
Tells us how much more likely someone with a negative test result doesn’t have the disease.
Lower LR- = Better test.
Example: If a test has LR- of 0.05, it means a negative test result reduces the chance of having the disease by 95%.
Calculation:
LR- = 1 - sensitivity / specificity
What is the difference between likelihood ratios and PPV/NPV?
**Likelihood ratios tell you shift in probability.
**
LR+ = If you get + test, your probability shifts up by X.
LR- = If you get - test, your probability shifts down by X.
**Predictive values tell you current probability.
**PPV = If you get + test, your probability of having disease is Y.
NPV = If you get - test, your probability of not having disease is Y.
How do you measure accuracy of a test?
TP + TN / All Screened people
TP + TN / TP + TN + FP +FN
What is number needed to treat? And how do you calculate it? (2)
Means how many people in the general population need to be treated to prevent one case.
- Calculation: Inverse of the incidence rate
If incidence = 16/1000 people
NNT = 1000/16 = 625.
Need 625 general population people to prevent one case.
- Calculation: 1 / absolute risk reduction
Number of treated people required to prevent one adverse outcome.
Eg.
60% of patients responded to treatment X
15% of patients responded to placebo
NNT = 1 / (0.6-0.15) = 2.2
Fill in table.
What is sensitivity? How do you calculate sensitivity?
True positive rate = proportion of people with the disease who are correctly classified by screening test as positive.
True positives / all those with pathology
True positives / True positives + false negatives