Ch. 13 Flashcards
What to do when the assumptions are not true?
- Ignore the violations
- Transform the data
- Use nonparametric method
- Use permutation test
When is data not likely to be normal, graphically?
(pg. 371)
When distribution is strongly skewed or strongly bimodal, or has outliers
Normal Quantile Plot
- What is it?
Compares each observation in the sample w/ its quantile expected from the standard normal distribution
If it is normally distributed, points should roughly be in a straight line.
Shapiro-Wilk test
What is it?
Shapiro-Wilk test evaluates goodness of fit of a normal distribution to a set of data randomly sampled from population (null hypothesis is that it is normal)
Recommended methods of evaluating assumption of normality?
- Can do statistical test (ex. Shapiro-Wilk test), but has false sense of security
- IDEALLY should use graphical methods & common sense to evaluate (frequency distribution histograms, normal quantile plots)
Robust
Def’n?
A statistical procedure is robust if the answer it gives is not sensative to violations of the assumptions of the method.
Normal Approximation - Ignoring violations
- For what types of tests?
- Threshold of “ignorance”?
- For tests that use the mean (robustness due to the central limit theorem)
- Ignorance threshold rule of thumb is n > 50(ish)
- Also need to consider the shape of the distributions; skew needs to be similar, and no outliers. See pg. 376
When can assumptions of equal standard deviations can be ignored?
- If n > 30 for each group, and n is similar for both groups (approximately), then SD can be ignored even w/ greater than 3x difference
- When:
- n is not approximately equal
Data transformation def’n
Data transformation changes each measurement by the same mathematical formula
Purpose of a transform?
- To attempt to make SD more similar and to improve the fit of the normal distribution to the data.
NOTE: This tranform will affect all the data AND the hypotheses equally; i.e. everything gets tranformed the same way. Also, can’t just do ln[s] for example; have to re-calculate starting with the mean of all the logs.
Examples of possible transformations?
(indicate top 3)
- Log transform
- Arcsine transform
- Square-root transform
- Square transform
- Antilog transform
- Reciprocal transform
Log tranform - what does it do?
Converts each data point to its logarithm
Ex. Y = ln[X]
What to do if you want to try log transform but data has zero?
Try Y’ = ln[Y + 1]
Log transform - When is it useful?
- When measurements are ratios or products of variables
- When frequency distribution is right skewed
- When group that has the larger mean (when comparing 2 groups) also has the larger SD
- When data spans several orders of magnitude
See pg. 378 for details
Arcsine tranformation
- What does it look like?
- What is it needed for?
- How does it fix things?
- p’ = arcsin[sqrt(p)]
- Used for proportions
- Makes it closer to a normal distribution and also makes SD’s similar
Note: convert percentages into decimal proportions first b4 application