Quiz 4 Flashcards
Three General Assumptions of Parametric Statistical Tests
- Normality of sampling distributions/population residuals
- When you compare more than 1 population, the variances of the populations are equal
- Data from your population are independent
Parametric Tests
- statistical tests used to estimate a specific parameter value (e.g. t-tests)
Normality
Inferential statistics assume that our sample data are drawn from normal sampling distributions
How can we know about normality?
- If we have a large enough sample (typically more than 30), we have met the assumption
- If we have a small sample, we examine our sample to infer normality of the sampling distribution
- If you have multivariate data, examine each variable by itself to see if it is normally distributed (e.g. aX + bY) → a linear combination needs to be normal
- If we have a normally distributed sample data, it is likely that it is from a normally distributed population → sampling distribution would be normal
Skewness
- When a distribution is perfectly normal, the values of skewness and kurtosis are zero
- Positive skewness means that there is a pile up of cases on the left and a long right tail (skewed to the right)
- Negative skewness means that there is a pileup of cases to the right and a long left tail (skewed to the left)
When does the CLT not work/apply?
- When distributions have thick tails
- If your sample is small
The Central Limit Theorem
- The CLT is one of the most remarkable results of the theory of probability
- In its simplest form, theorem states that:
- The mean of a large
number of
independent
observations from
the same distribution
has, under certain
general conditions, an
approximate normal
distribution - Note: exception of distributions with heavy tails
Testing Normality in Single Variables
TB p. 183-191
- Is the sample size big enough to assume that the sampling distribution is normally distributed?
- Look at histogram of each continuous variable → starting at visual inspection of normality
- Perform the Kolmogrorov-Smirnov (K-S) test or the Shapiro-Wilk test
- Significant results would suggest that the data are NOT normally distributed
- Caveat: the power of the test depends on the sample size, and is often a moot point because in large samples without thick tails, we would assume normality anyways
- Shapiro-Wilk test is highly sensitive to even small deviations from normality in large samples
Look to skewness and kurtosis stats
For the formula for skewness
- Average of the z scores raised to the third power
–> Increases the influence of outliers by raising to the third power
- Converting raw scores into z scores
- If skewed to the right, we will get a positive skewness score
- If skewed to the left, we will get a negative value
- No skewness → Formula results in zero
Cutoffs: ± 2
If skewed to the right, we will get a _______ skewness score
positive
If skewed to the left, we will get a ______ value
negative
Formula for Kurtosis K4
- Kurtosis values above zero indicate a distribution that is too peaked with short thicks tails
- Kurtosis value below zero is platykurtic
Leptokurtic (thicker tails)→ positive kurtosis statistic
Platykurtic (thinner tails) → negative kurtosis statistic
Kurtosis value below zero is _________
platykurtic
Leptokurtic (thicker tails)→ ______ kurtosis statistic
positive
Platykurtic (thinner tails) → ________ kurtosis statistic
negative
Rule of thumb cutoffs for kurtosis
±7 be concerned
Significance Tests for Skewness and Kurtosis
Step 1: convert skewness and kurtosis scores into z scores
Step 2: compare z scores to critical values of ±1.96 for small samples and ±2.58 for large samples. If greater than the critical value, significant skewness/kurtosis
- More stringent for larger samples
What is the Big Deal if a Distribution Isn’t “Normal?”
- We could get inaccurate results from our analysis
- Mess with type I and type II error rates
- Meaning that the null could be true when our stats tell us it isn’t or vice versa
What to do if normality assumption is NOT met
- Data transformation
Appropriate when there is skewness in distribution
- Replacing the data with a function of all the data within that variable - Non-parametric tests
- Modern methods (e.g. bootstrapping)
Data Transformation
Most common → square root transformation
Most useful when data are skewed to the right
Pulls more extreme values closer to the middle
Bigger impact on bigger values
Square Root Transformation is most useful when _______
data are skewed to the right
When data are skewed left what transformation can be done?
When data are skewed to the left:
- Reflect scores and the do a square root transformation
- Subtract values from a large number to reflect
Log transformations
For extreme positive skew → reduces positive skew
Pulls in values to a greater degree than square root transformation
Inverse transformation
- Transforms data with extreme positive skew to normal
- 1 / (value of data)
- Need to add a constant to bring all values to non zero, but CAN have a negative as long as there’s no zero
- Table 6.1 in TB