LEC 4 Statistics Flashcards
2 branches of statistics
- Descriptive statistics
- describe the attributes of a group or population - Inferential statistics
- draw conclusion from a sample and make inference to the entire population
3 types of variables
- Nominal
- categories that are not ordered
- categorical
eg subject count - Ordinal
- categories that are ordered
- no fixed interval
eg scale & cancer stages - Continuous
- with real values that reflect order and relative magnitude
- fixed interval
- normal or non-normal distribution
eg age
Describing Nominal data & graphical presentation
n(%) n = frequency % = proportion Graphical presentation - pie chart - bar chart (normal, clustered, stacked & segmented)
Describing Ordinal data & graphical presentation
eg Likert scale
n(%) n = frequency % = proportion Graphical presentation : - pie chart - bar chart (normal, clustered, stacked & segmented)
OR
median (IQR)
Graphical presentation :
- box plot (box-and-whiskers plot)
Box plot whiskers
Whiskers (1.5x of IQR)
Mild outlier : 1.5-3x below Q1 or above Q3
Extreme outlier : >3x below Q1 or above Q3
Describing Continuous data & graphical presentation
- normal
- non-normal
Graphical presentation :
- histogram
- box plots
Normal distribution - mean +/- SD
Non-Normal distribution - median (IQR)
Types of distribution of Continuous data (5)
- Normal distribution
- Positively skewed (right)
- Negatively skewed (left)
- Bimodal distribution (U shape)
- Several peaks
Measure of central tendency
- mean
- median
Measure of variability
- SD
- IQR
How to ensure sample will lead to reliable and valid inferences
Random sampling
Types of inferential statistics (2)
- Parameter Estimation
2. Hypothesis Testing
Population mean
Mean of all sample means
Central Limit Theorem
For sufficiently large sample sizes, the sampling distribution of the mean is approximately normally distributed
even if the underlying distribution of individual observation in the population is not normally distributed
Factors affecting the width of CI (3)
- Confidence level
- Sample size
- Standard deviation
p-value
- Probability that the observed result or a more extreme result would occur by chance alone, assuming Ho is true
- The smaller the p-value, the stronger the evidence against Ho
Type l error
= alpha
- false positive
- reject Ho when Ho is true (no significant difference/no effect)
Type ll error
= beta
- false negative
- failure to reject Ho when Ho is not true (there is significant difference/effect)
Statistical power
1-beta
Probability of correctly rejecting Ho when Ho is not true (there is significant difference/effect)
Why CI is more informative than p-value? (2)
- Precision of the point estimate
- width of CI - Statistical significance
95% CI of difference
- If does not include 0, there is statistical difference
- p<0.05
eg mean diff = 0.45, 95% CI (0.3,0.6)
95% CI of ratio
- If does not include 1, there is statistical difference
- p<0.05
eg odds ratio = 2.51, 95% CI (1.04,3.28)
Assessing normality of continuous data (3)
& hypothesis
Visual inspect of histogram or box plot
Shapiro-Wilk test (n<50)
Kolmogorov-Smirnov test (n>=50)
Ho = normal distribution H1 = not normal distribution
(( comparing data ))
Test for :
- continuous data (normal)
- 2 groups
- independent
Independent t-test
(( comparing data ))
Test for :
- continuous data (normal)
- 2 groups
- paired
Paired samples t-test
(( comparing data ))
Test for :
- continuous data (normal)
- > 2 groups
- independent
One-way ANOVA
(( comparing data ))
Test for
- continuous data (non-normal) or ordinal data
- 2 groups
- independent
Wilcoxon rank sum test (Mann-Whitney U test)
(( comparing data ))
Test for :
- continuous data (non-normal) or ordinal data
- 2 groups
- paired
Wilcoxon signed-rank sum test
(( comparing data ))
Test for :
- continuous data (non-normal) or ordinal data
- > 2 groups
- independent
Kruskal-Wallis test
(( comparing data ))
Test for :
- nominal data
- 2 groups
- independent
Chi-square test or Fisher’s exact test
(( comparing data ))
Test for :
- nominal data
- 2 groups
- paired
McNemar’s test
(( comparing data ))
Test for :
- nominal data
- > 2 groups
- independent
Chi-square test or Fisher-Freeman-Halton test
Describing Ordinal data
eg Cancer stages
n(%) n = frequency % = proportion Graphical presentation : - pie chart - bar chart (normal, clustered, stacked & segmented)
Ordinal data & graphical presentation
Cancer stages = Likert scale
No.
Cancer stages can only be described in
- frequency (n) and proportion (%)
Likert scale can be described in both
- frequency (n) and proportion (%)
- median & IQR
Skewing of graph
Where the tail is at
Determine types of distribution from box plot
- rotate box plot 90 degrees clockwise
- check where is the longer whiskers (right or left)
If data given in median (Q1, Q3), how to differentiate between continuous data & ordinal data?
- check the type of data collected
eg length of hospital stay vs Likert scale
Parameter Estimation
- seeks an approximate calculation of a population parameter
eg “by how much __ reduce blood pressure?”
(a) Point estimate
(b) Interval estimate (confidence interval)
Hypothesis Testing
- seeks to validate a supposition based on limited evidence (hypothesis testing)
eg “does __ reduce blood pressure?”
(a) Null hypothesis
(b) Alternate hypothesis
SD of the sample means
SD of population divided by square root of sample size
SD of sample means vs SD of sample scores
SD of sample means = SEM
SD of sample scores = sample SD
SEM
Standard Error of the Mean
- quantification of the variability of the sample mean values
Point estimate
eg population mean
Interval estimate (2)
- provide a range of reasonable values that are intended to contain the point estimate
- with a certain degree of confidence (usually 95%)
eg confidence interval
Confidence level affecting CI
- larger confidence level, wider CI
- smaller confidence level, narrower CI
(( correlation ))
Test for :
- continuous normally distributed data
Pearson Product-Moment Correlation (r)
(( correlation ))
Test for :
- continuous non-normally distributed data
- ordinal data
Spearman Rank Correlation (rs)
(( regression ))
Dependent variable is :
- continuous (normally or non-normally distributed)
Linear regression
- simple
- multiple / multivariable
(( regression ))
Dependent variable is :
- ordinal
Ordinal regression
- simple
- multiple / multivariable
(( regression ))
Dependent variable is :
- nominal (dichotomous/binary) variable
Logistic regression
- simple
- multiple / multivariable
When comparing data, consider (4)
- No. of groups
- Independent or paired/related groups
- Type of data
- nominal
- ordinal
- continuous (normal or non-normal) - Assumptions
- esp nominal data (chi square or fisher’s exact test)