DT Flashcards
Ungrouped data is?
What type of analyses?
What to consider?
One or more continuous independent variable (e.g. hight)
• Use correlations, regressions
• Need to check for normality/skewnwss
Grouped data
What type of analyses?
What to consider? (2 things)
One or more categorical independent variables
t-test, ANOVA
(1) Normality is assumed if sample sizes within cells are 20 or greater (central limit theorem)
BUT sample sizes within cells tends to be small
(2) outlines can exert too much influence
What is central limit theorem?
The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed
What techniques can be used for assessing normality?
1) Normal probability plots
2) Normality tests
3) Histograms
What are the normality tests discussed in class?
1) Tests for skewness
2) Shapiro-Wilk
3) Kolmogorov-Smirnov
What is the test for skewness and how is it interpreted?
skewness statistic
OVER
standard error of skewness
- > compared to a critical z value that varies based on sample size
- > positive = positively skewed, neg = negatively skewed (if outside of critical range)
When is the Kolmogorov-Smirnov test valid? when is it not?
Valid when testing whether a set of observations are from a completely specified continuous distribution
Meaning: if one or more parameters must be estimated from the sample then the tables are not valid
What types of probability plots are used to assess normality?
Q-Q Plot
P-P Plot
Detrended normal probability plots
What is a Q-Q plot?
What are Q-Q plots better for?
The Q-Q is plotting the actual values of the variable against the theoretical values for the normal distribution.
Q-Q plot is better at finding deviations in the tails (Q has a tail)
What are detrended normal probability plots?
Deviations from the diagonal are plotted meaning that the positive linear trend is eliminated
What are the 4 steps for producing a normal probability plot?
- arranging the data from smallest to largest.
- determining the percentile of each data value.
- determining the corresponding z-scores from these percentiles based on the normal distribution.
- plotting each z-score against its corresponding data value.
What is a P-P plot?
What are P-P plots better for?
A P-P plot plots the corresponding areas under the curve (cumulative distribution function) for those values.
P-P plot is better at finding deviations from normality in the center of the distribution
Which plot tends to be preferred in research situations?
Q-Q (over P-P)
What would need to be done if there is a large number of subjects that are contributing to the skew?
what would you consider if there is only a small number of subjects contributing to it?
Mathematical transformation
Winsorizing
What is Winsorizing
a method for minimizing the influence of outliers by
(1) assigning the outlier a lower weight
OR
(2) changing the outlier value so that it is closer to the other values in the set