8. Comparing means: assumptions and transformations Flashcards
what are the assumptions of of statistical interference from a normal distribution
Data are sampled at random
■ for response variables conditioned on explanatory variables
Samples are independent.
The difference between observations and predictions are normally distributed.
The mean and variance of errors are independent of the explanatory variable(s).
One source of unmeasured random variance.
Variance among groups is equal
■ and if not, then you use an adjustment
what methods can you use when your response variable does not have a
normal distribution
First check how much it deviates from normality.
- use a normal quantile plot
- and a Shapiros-Wilk’s test
Then you can either:
- ignore the violations of the assumptions
- transform the data
- use a non-parametric method
- use a permutation test
What does a Shapiros-Wilk’s test do?
Evaluates the goodness of fit of a normal distribution
Can quantify deviation from normality
■ Doesn’t tell you whether the data is normally distributed tho
When is it sensible to ignore the violated assumptions of a normal distribution?
- when using robust statistics (Central limit theory)
- if the shape of distribution is similar
- when using an accepted adjustment for difference in var/s.d (welch’s t-test)
What is central limit theory?
Sum of mean of large random sample from a non-normal population is approximately normally distributed
How can we transform data and when do we do each?
log
- ratios and products of variables
nat-log
- ratios and products of variables
- skewed freq distribution
- group with larger mean also has larger s.d
- data spans several orders of mag
arc-sin
- proportions
sqr-root
- counts and right-skewed
Reciprocal
- right skewed
Exponential
- left skewed
sqr
- left skewed
What does a non-parametric method do?
calculates probabilities in a way that does not depend on normality of response variable
less powerful though
What are two non-parametric methods?
Sign test
Mann-Whitney U-test
How do you do a sign test?
Turn the difference in data points
into a binomial data set
- calc difference
- assign ‘+’ or ‘-‘ based on if it is > or < 0
- count number of ‘+’ and ‘-‘
- H0 expects #’+’ == #’-‘
- use binomial distribution to calculate p-value for test
How do you do a Mann-Whitney U-test?
- order all the data into smallest to largest
- give each a ranks starting with 1
(if same rank use the average e.g. 3 and 4 become 3.5) - calculate rank sum of each group
- calculate u-statistic for each group
- larger the u statistic is used as the test static
- compare to cv from a table
What are the assumptions of the Mann-Whitney U-test?
- data is randomly sampled
- tests if data has different distributions
(not robust to test for same central tendencies) - distribution is same shape
- low power due to not using all data
(greater type 2 error)
What is a permutation test (what else is it known by)?
use of a computer to repeatedly randomly sample your sample
to produce a null distribution with a large sample size
aka bootstrapping
What are the steps of a permutation test?
Create response variable that are randomly re‐ordered.
Calculate the measure of association for the
permuted sample
● (e.g. the difference in means, medians, etc.)
Repeat the permutation process many times
● at least 1000 or more to create a null
distribution
Compare to observed value of test static calculated from original data set