13. Dealing with assumptions Flashcards
Assumptions of t-tests
- random samples
- populations are normally distributed
- TWO SAMPLE T-TEST: populations have equal variances
Can we statistically fix random sampling?
NO!!!
Use of histograms for normality
Not normal if skewed, asymmetrical
**especially clear with a large data set
How can you check in something is normally distributed?
Check previous data/theory
Plot on a histogram - bell curve shape!
Quantile plot (QQ plot) expect dots to fall along a straight line if normal!
Shapiro-Wilk test (formal test of normality) - not particularly useful in deciding what test to use
what are quantiles
divides range of probability distribution into continuous intervals w/ equal probabilities
10% below this value
20% below this value
30% below this value
Shapiro-Wilk test!
Used to test statistically whether a set of data comes from a normal distribution
NOT a good thing to decide if t-test BECAUSE more likely to reject a false null hypothesis when lots of data, BUT mostly we care about distribution of sample means being normally distrib, therefore would be fine to do other tests BUT shapiro-wilk would say no!
Strategies if NOT normal
if sample size is large, sometimes parametric tests work OK anyway!
transformations (ex log) - new set of values that may fit assumptions
non-parametric tests - makes fewer assumptions about distributions data came from, often based on ranks
permutation tests - asks if theres association btwn two variables, mix up variables and find association you would get by chance, compare to actual association!
bootstrapping
rule of thumb for sample size
if sample size > ~50, the normal approximations may work
Really great test:
Welch’s t-test
If sample sizes are equal and large, what sort of difference in variance is APPROXIMATELY OK
ten-fold difference!
Requirements of data transformations
Same transformation applied to each individual (for a specific variable)
One to one correspondence to original values/transformed values - NOT absolute values!!! have to be able to go backwards!
monotonic relationship w/original values (ex large values stay larger)
Non-Parametric Tests
Assume less than parametric about underlying distributions
Most often RANK each data point in all samples from lowest to highest
Log transformation
Y’ = ln[Y]
Good when variable is likely to be the result of multiplication or division of various components - what was multiplicative, becomes additive
EX. growth - grow 10% a year not +10 mm, so log normal not normal normal!
Good for RIGHT skewed data, not left!!!
Good when variance becomes larger in groups where mean is larger
Test to compare central tendencies of two groups using ranks
Mann-Whitney U test