LEC 4 Statistics Flashcards
2 branches of statistics
- Descriptive statistics
- describe the attributes of a group or population - Inferential statistics
- draw conclusion from a sample and make inference to the entire population
3 types of variables
- Nominal
- categories that are not ordered
- categorical
eg subject count - Ordinal
- categories that are ordered
- no fixed interval
eg scale & cancer stages - Continuous
- with real values that reflect order and relative magnitude
- fixed interval
- normal or non-normal distribution
eg age
Describing Nominal data & graphical presentation
n(%) n = frequency % = proportion Graphical presentation - pie chart - bar chart (normal, clustered, stacked & segmented)
Describing Ordinal data & graphical presentation
eg Likert scale
n(%) n = frequency % = proportion Graphical presentation : - pie chart - bar chart (normal, clustered, stacked & segmented)
OR
median (IQR)
Graphical presentation :
- box plot (box-and-whiskers plot)
Box plot whiskers
Whiskers (1.5x of IQR)
Mild outlier : 1.5-3x below Q1 or above Q3
Extreme outlier : >3x below Q1 or above Q3
Describing Continuous data & graphical presentation
- normal
- non-normal
Graphical presentation :
- histogram
- box plots
Normal distribution - mean +/- SD
Non-Normal distribution - median (IQR)
Types of distribution of Continuous data (5)
- Normal distribution
- Positively skewed (right)
- Negatively skewed (left)
- Bimodal distribution (U shape)
- Several peaks
Measure of central tendency
- mean
- median
Measure of variability
- SD
- IQR
How to ensure sample will lead to reliable and valid inferences
Random sampling
Types of inferential statistics (2)
- Parameter Estimation
2. Hypothesis Testing
Population mean
Mean of all sample means
Central Limit Theorem
For sufficiently large sample sizes, the sampling distribution of the mean is approximately normally distributed
even if the underlying distribution of individual observation in the population is not normally distributed
Factors affecting the width of CI (3)
- Confidence level
- Sample size
- Standard deviation
p-value
- Probability that the observed result or a more extreme result would occur by chance alone, assuming Ho is true
- The smaller the p-value, the stronger the evidence against Ho
Type l error
= alpha
- false positive
- reject Ho when Ho is true (no significant difference/no effect)
Type ll error
= beta
- false negative
- failure to reject Ho when Ho is not true (there is significant difference/effect)
Statistical power
1-beta
Probability of correctly rejecting Ho when Ho is not true (there is significant difference/effect)
Why CI is more informative than p-value? (2)
- Precision of the point estimate
- width of CI - Statistical significance
95% CI of difference
- If does not include 0, there is statistical difference
- p<0.05
eg mean diff = 0.45, 95% CI (0.3,0.6)