Assumptions Flashcards
Detections of variation from normality
1) Histograms
2) Normal quantile plots
Compares each observation in sample w/ corresponding quantile expected from standard normal distrib
3) Formal test of normality (Shapiro Wilk Test)
Ways to feal with violations of assumptions
1) Ignore the violation of the assumptions
2) Transform the data – use a mathematical transformation method to alter the distribution.
3) Use a non‐parametric method – these methods calculate probabilities in a way that does not depend on whether the response variable has a normal distribution.
4) Use a permutation test – (or bootstrapping) – use a computer to repeatedly randomly sample your sample to produce a null distribution with a large sample size
Ignoring violations of assumptions
A statistical method is robust if violations of its assumptions do not greatly affect its results.
One sample T-test:
- robust to skew if the sample size is large
- Never robust to outliers
Two sample T-test
- Robust to skew if skew of both samples is in the same direction and the sample size is above 30
- Robust to skew in different direction is the sample size is above 500
- Robust to difference in SD up to 3 fold as a long as sample size are equal and greater than 30
- never robust to outliers
F test
- Always required normal disitrbution
Transforming the data
Log tranformations
-> Right skewed data
-> group with larger mean also has alrger sd
-> data spans multiple magnitude
Arcsin tranformation
-> Proportion data
Sqrt tranformation
-> Count data
-> Right skewed
Non- Parametric tests
These methods calculate probabilities in a way that does not depend on whether the response variable has a normal distribution.
1) The Sign test (alternative to paired/ one sample T test)
This test compares the median of a sample to a constant specified in the null hypothesis
- Calculate the difference
- Assign +/- (if equal then ignore and reduce n)
- Count the number below 0 (negative)
- Do binomial disitrbution for below 0 values
- Multiple by 2
2) Wilcoxon signed rank test (alternative to paired t test)
The Wilcoxon signed-rank test retains information about magnitudes—that is, how far above or below the hypothesized median each data point lies.
Assumes symetrical disitrbution
3) Mann- Whitney U-test (alternative to 2-sample T test)
This test compares the disitrbution of two groups.
- Rank data and sum the ranks or each group
- Calculate U for lower total
- Find U from from stastical table using sample size
- Compare to the smaller value of U calculated (to be safe)
Assumptions of Man-Whitney U test
- It assumes that the data are randomly sampled.
- Tests whether the data have different distributions. It is not a robust test of whether the data have the same measures of central tendency (i.e. means/medians).
- Mann‐Whitney can be used to test of similarity of means/medians only if the distributions have the same shape.
- Lower power (greater type II error) as not using all data availiable
Permutation test
A permutation test generates a null distribution for the association between two variables by repeatedly and randomly rearranging the values of one of the two variables in the data
- Create permutated set of data
- Repeat at least 1000 times
- create null disitrbution
- Calculate the proportion of values in the null distirbution that are as extreme of more extreme than the observed value
Assumptions of permutation test
- The data must be a random sample from the population
- For permutation tests that compare means or medians between groups, the distribution of the variable must have the same shape in every population
Parametric vs non-parametric
Parametric: Statistical methods—such as the one-sample, paired, and two-sample t-tests—that make assumptions about the distribution of variables
Non-parametric: Methods that do not make assumptions about the distribution of variables
Assumptions when using a normal distribution for statistical inference
1) Data are sampled at random (for response variables conditioned on explanatory variables)
2) Samples are independent.
3) The difference between observations & predictions are normally distributed.
4) The mean and variance of errors are independent of the explanatory variable(s).
5) One source of unmeasured random variance.
6) Variance among groups is equal (and if not, then you use an adjustment)