L05 Descriptive & Inferential Statistics Flashcards
Differentiate the two branches of statistics.
1) Descriptive Statistics:
- Methods for organising and summarising a set of data that help to describe the attributes of a group or population
2) Inferential Statistics:
- Statistical methods used to draw conclusions from a sample & make inferences to the entire population
Differentiate the three types of statistical variables.
1) Continuous / Interval Variable:
- With real values that reflect order & relative magnitude
- e.g. age, weight, height
2) Ordinal Variable:
- With categories that are ordered / hierachial
- e.g. cancer stages, pain rating, Likert scale data
3) Nominal / Categorical Variable:
- With categories that are not ordered
- e.g. gender, race, smoking status, blood groups
How are nominal/categorical variables presented in a study?
Numerically, summarised as frequency (n) AND proportion (%) i.e. n (%)
Graphically, can be presented as pie chart, bar chart
- e.g. stacked bar chart, clustered bar chart, segmented bar chart
How are ordinal variables presented in a study?
Numerically, summarised as frequency (n) AND proportion (%) i.e. n (%)
Graphically, can be presented as pie chart, bar chart
- e.g. stacked bar chart, clustered bar chart, segmented bar chart
OR
Numerically, summarised as median AND interquartile range i.e. median (IQR)
Graphically, can be presented as a box-and-whiskers plot
How are continuous variables presented in a study?
Numerically, summarised as measure of central tendency (mean or median) AND measure of variability (standard deviation SD or IQR)
- Normal distribution = mean (SD)
- Non-normal distribution = median (IQR)
Graphically, can be presented as histogram, box-and-whiskers plot
- e.g. stacked bar chart, clustered bar chart, segmented bar chart
Differentiate between ‘outliers’, ‘mild outliers’ & ‘extreme outliers’.
Outliers: Values > 1.5 x IQR below Q1 or above Q3
Mild outliers: Values > 1.5 to 3 x IQR below Q1 or above Q3
Extreme outliers: Values > 3 x IQR below Q1 or above Q3
List the possible types of distributions observed in histograms.
Normal
Positively skewed (i.e. tail skewed to right)
Negatively skewed (i.e. tail skewed to left)
Bimodal
Several peaks
When a box plot is presented in a vertical direction, what should you do to interpret its type of distribution along a horizontal plane?
Rotate box plot clockwise by 90 degrees to determine type of distribution.
To ensure that a sample will lead to reliable and valid inferences, all statistical methods are built on the assumption that the individuals included in a sample represent a _____ sample from the underlying population.
random
What are the two approaches an investigator can adopt for statistical inference? Briefly explain each approach.
1) Parameter estimation
- Seeks an approximate calculation of a population parameter
- e.g. By how much does this new drug reduce BP?
- Described by point estimate and interval estimate
2) Hypothesis testing
- Seeks to validate a supposition based on limited evidence, inferred using a sample from the population
- e.g. Does this new drug reduce blood pressure?
- Described by null hypothesis (H0) & alternative hypothesis (H1)
Define ‘standard error of the mean’ (SEM).
Explain what is the significance of SEM in inferential statistics.
SEM:
Standard deviation of sample means equal to the population standard deviation divided by the square root of the sample size
Significance:
- Estimate the precision or reliability of a sample, as it relates to the population from which the sample was drawn
- Used in the calculation of confidence intervals, which contain an estimate of the true mean for an entire population from which the sample was drawn
Which theorem states that the sampling distribution of the mean is approximately normally distributed, for a sufficiently large sample size, even if the underlying distribution of individual observations in the population is not normal?
Central Limit Theorem
What is the significance of the interval estimate in the parameter estimation approach of inferential statistics?
Also known as the confidence interval (CI).
Provides a range of reasonable values that are intended to contain the parameter of interest with a certain degree of confidence.
- e.g. 95% CI: If data collection and analysis could be replicated many times, the CI should include within it the true value of the measure 95% of the time.
- Provides information on the precision of the point estimate i.e. the narrower the 95% CI, the more precise the point estimate
List the three factors influencing the width of CI.
1) Confidence level (e.g. 90%, 95%, 99%)
- The higher the confidence level, the wider the CI, the less precise the point estimate.
2) Sample size (n)
- The larger the sample size, the smaller the SEM value, the narrower the CI, the more precise the point estimate.
3) Standard deviation (sigma)
- The larger the SD, the wider the CI, the less precise the point estimate.
90% CI: sample mean - 1.645 [SD/root(n)] <= pop. mean <= sample mean + 1.645 [SD/root(n)]
95% CI: sample mean - 1.960 [SD/root(n)] <= pop. mean <= sample mean + 1.960 [SD/root(n)]
99% CI: sample mean - 2.576 [SD/root(n)] <= pop. mean <= sample mean + 2.576 [SD/root(n)]
Explain what p-value means.
Probability that the observed results or a more extreme result would happen by chance alone, assuming H0 is true.
- The smaller the p-value, the stronger the evidence against H0.