Inferential statistics Flashcards
why is randomisation important
avoid bias and confounders
sampling methods
simple random sampling
stratified sampling
convenience samplong
what is data provenance
Data provenance is the history of a dataset – Data cleaning processes – Imputations for missing data – How it was collected and by whom – Access to previous versions etc
standard error of the mean
SE = s/√n
Hypothesis testing
A hypothesis should be something that is testable and falsifiable
procedure for hypothesis test
• a random sample is drawn from a population
• a null hypothesis is formulated
• a test-statistic is calculated, of which we
know the probability distribution
• p-value: evidence for a hypothesis comparing the observed value of the statistic
with the corresponding distribution
• if the p-value<0.05, reject the null hypothesis
normality testing
Shapiro tests
H0 = Null hypothesis: Data is normally
distributed
H1 = Alternative hypothesis: Data is not
normally distributed
T-test & p value
The probability that these two variables are
from the same population, specifically the means.
z test and t test definition
• The probability that two means are from the
same populations
• The probability that two means are from
different populations
Type I Error
– errors where the result is statistically significant
despite the fact that the null hypothesis is true
– i.e., a diagnosis of cancer (“positive”) for healthy
subject
Solution: change alpha value from 5% to 1%
Type II Errors
– errors where the result is NOT significant despite
the fact that the hypothesis is true
– i.e., a diagnosis of healthy for a subject who has
cancer
Sensitivity and Specificity
• Sensitivity (power): proportion of the
positives that are correctly identified by a
test as being positive
• Specificity: proportion of negatives that are
correctly identified by a test as being
negative