L5: Descriptive & Inferential Statistics Flashcards
What are the two branches of statistics?
Descriptive statistics:
Methods for organizing and summarizing a set of data that help to describe the attributes of a group or population.
Inferential statistics:
Statistical methods used to draw conclusions from a sample and make inferences to the entire population.
What are the three types of variables?
- Nominal e.g. gender, race etc.
- Ordinal e.g. cancer stages, likert scale data
- Continuous e.g. height and weight
How to describe nominal data numerically and graphically?
Numerically: n (%)
Graphically: pie chart, bar chart
How to describe ordinal data numerically and graphically?
For most ordinal data eg cancer stages:
- numerically as n (%)
- graphically as pie chart, bar chart
For likert scale data:
- numerically as n (%), graphically as pie chart, bar chart OR
- numerically as median (IQR), graphically as box plot
How to describe continuous data numerically and graphically?
Graphically: histogram, box plot
Numerically:
- if normally distributed, mean and SD
- if non-normally distributed, median (IQR)
What are the two approaches to statistical inference?
- parameter estimation: seeks an approximate calculation of a population parameter
- hypothesis testing: seeks to validate a supposition based on limited evidence, inferred using a sample from the population
What is a point estimate and interval estimate (confidence interval)?
- point estimate involves using the sample data to calculate a single number to estimate the parameter of interest
- interval estimate (confidence interval) provides a range of reasonable values intended to contain the parameter of interest with 95% confidence
Why does confidence interval provide us more information than p-value?
- provides info on precision of point estimate (narrower CI-> more precise point estimate; wider CI -> less precise point estimate)
- also tells us statistical significance (when difference does not contain 0 and when ratio does not contain 1)
What is the width of the confidence interval affected by?
- higher the confidence level, wider the CI
- larger the sample size, narrower the CI
- higher the standard deviation, wider the CI
What is p-value?
Probability that the observed result or a more extreme result would occur by chance alone, assuming that H0 is true.
The smaller the p-value, the stronger the evidence against H0
What is a type I error?
False positive (reject H0 when the truth is no effect) Probability of type 1 error = significance level = alpha = 0.05
What is a type 2 error?
False negative (fail to reject H0 when truth is there is an effect) Probability of type 2 error = beta = typically 0.2
What is statistical power?
Probability of correctly rejecting a false H0 when the truth is an effect exists.
Statistical power = 1- beta = 0.8
What is the difference between statistical and clinical significance?
- statistical significance is heavily dependent on sample size. with large sample size, even small effect can appear statistically significant. however, with small sample size, even large effect which are clinically consequential can appear non statistically significant
- hence, DO NOT just simply look at whether statistically significant. look at point estimate and confidence interval to interpret if clinically significant or not.
What normality test are used for which sample size?
- Shapiro wilk for n <50
- Kolmogorov-smirnov test for n more than equal to 50