DT Flashcards
Ungrouped data is?
What type of analyses?
What to consider?
One or more continuous independent variable (e.g. hight)
• Use correlations, regressions
• Need to check for normality/skewnwss
Grouped data
What type of analyses?
What to consider? (2 things)
One or more categorical independent variables
t-test, ANOVA
(1) Normality is assumed if sample sizes within cells are 20 or greater (central limit theorem)
BUT sample sizes within cells tends to be small
(2) outlines can exert too much influence
What is central limit theorem?
The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed
What techniques can be used for assessing normality?
1) Normal probability plots
2) Normality tests
3) Histograms
What are the normality tests discussed in class?
1) Tests for skewness
2) Shapiro-Wilk
3) Kolmogorov-Smirnov
What is the test for skewness and how is it interpreted?
skewness statistic
OVER
standard error of skewness
- > compared to a critical z value that varies based on sample size
- > positive = positively skewed, neg = negatively skewed (if outside of critical range)
When is the Kolmogorov-Smirnov test valid? when is it not?
Valid when testing whether a set of observations are from a completely specified continuous distribution
Meaning: if one or more parameters must be estimated from the sample then the tables are not valid
What types of probability plots are used to assess normality?
Q-Q Plot
P-P Plot
Detrended normal probability plots
What is a Q-Q plot?
What are Q-Q plots better for?
The Q-Q is plotting the actual values of the variable against the theoretical values for the normal distribution.
Q-Q plot is better at finding deviations in the tails (Q has a tail)
What are detrended normal probability plots?
Deviations from the diagonal are plotted meaning that the positive linear trend is eliminated
What are the 4 steps for producing a normal probability plot?
- arranging the data from smallest to largest.
- determining the percentile of each data value.
- determining the corresponding z-scores from these percentiles based on the normal distribution.
- plotting each z-score against its corresponding data value.
What is a P-P plot?
What are P-P plots better for?
A P-P plot plots the corresponding areas under the curve (cumulative distribution function) for those values.
P-P plot is better at finding deviations from normality in the center of the distribution
Which plot tends to be preferred in research situations?
Q-Q (over P-P)
What would need to be done if there is a large number of subjects that are contributing to the skew?
what would you consider if there is only a small number of subjects contributing to it?
Mathematical transformation
Winsorizing
What is Winsorizing
a method for minimizing the influence of outliers by
(1) assigning the outlier a lower weight
OR
(2) changing the outlier value so that it is closer to the other values in the set
What do then Kolmogorov–Smirnov test and Shapiro–Wilk test do?
How are they interpreted
What is a potential concern?
they compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation
If the test is non-significant (p > .05) it tells us that the distribution of the sample is not significantly different from a normal distribution
Larger sample sizes increase the chance of getting significant results from small deviations from normality that may not be important
How do you interpret QQ or PP plots?
if a line sags consistently below or rises consistently above, the problem is kurtosis
if S shape the issue is skewness
What side is the hump on in a positive skew?
left hand side (remember turn to the right and it makes a P for positive)
What side is the hump on in a positive skew?
Right hand side
What does positive kurtosis vs negative kurtosis look like?
positive kurtosis is tall and skinny (pointier than normal)
completely negative is a bowl shape (partly negative would be flatter thank normal)
What are the 3 main types of transformations? What can be done with all of them?
- Square root
- Logarithmic
- Inverse
They can all be reflected
When would you use a reflection in a transformation? Why?
What to consider?
If you need to normalize a negative skew
- > a reflection turns it into a positive skew and then the positive skew transformations can be applied
- > must keep in mind that it was reflected in interpretation (or reflect back after not clear to me)
What are Tabachnick and Fidell’s suggestions for Moderate, Substantial, and Severe positive skewness?
Moderate: square root
Substantial: logarithmic
Severe: inverse
What is the Box-Cox Power Transformation?
A procedure for identifying the best exponent to use in a transformation in order to get the best normal shape
- > Lambda = the power that each value is raised to
- > Lambda is the best value between -5 and +5
What is Templeton’s two stage approach to data transformation?
It is a procedure for transforming continuous variables to normal
Step 1: Rank the data
Step 2: match the ranks with the corresponding variables from a normal distribution
What does excluding cases listwise mean? pairwise?
Listwise: Only subjects that have data for all the selected variables are included in the analysis (i.e. if you are looking at 4 variables and a participant only had data for 3 and has missing data for the other, this participant would be excluded from the analysis entirely - excluded for all 4 variables being looked at)
Pairwise: Variables are evaluated individually, therefore any subject with data for that variable will be included in the analysis for that variable -> this means you could have an unequal number of participants included in the analysis of each of the variables.
What is Raynald’s implementation of the box cox transformation?
Looks at 31 values of lambda between -2 to 1 (increments of .1)