Final Exam- Pearson's Correlation Flashcards
Why Screen Data?
- Avoiding erroneous conclusions by checking accuracy of data
- Use SPSS (PASW) frequency procedure - Avoiding missing data (from entry, participants, equipment, etc.)
- Avoiding extreme values (outliers). So extreme that they distort results.
- Meeting assumptions of particular tests
Stem and Leaf Display
Like a grouped frequency distribution without loss of information
- Stem: the intervals on the left
- Leaf: digits on the right side indicating frequency and number
Why does data go missing?
- Measurement Equipment Fails
- Participants do not complete all trials or all items
- Errors occur during data entry
Missing Data
If missing data are not randomly distributed, there can be systematic problems
What do you do with missing data?
- Analyze difference between groups (those with missing and those without)
- Delete cases and /or items
- Estimate missing values using
- Prior knowledge
- Calculating means using available data
- Use regression analyses to predict values
How do we find missing data?
- Analyze -> Descriptive Statistics -> Frequencies and…
2. Analyze -> Descriptive Statistics -> Explore
Replacing Missing Data
- Transform -> Replace Missing Values
2. Have the option to replace with series mean, mean (and median) of nearby points, and other imputations
Causes For Outliers
- Data-Entry Errors were made by the researcher
- The participant is not a member of the population for which the sample is intended
- The participant is simply different from the reminder of the sample
Why are outliers problematic?
- Can have disproportionate influence on results (many tests take squared deviations from mean)
- Statistical Tests are sensitive to outliers
- Can create Type I and Type II errors
How do we identify outliers in SPSS?
- Explore Menu (under Descriptive Statistics) can give you frequencies, highest and lowest scores, boxplots, and stem and leaf plots.
What should you do with outliers?
- Conduct analyses with and without
2. Some outliers are of interest (e.g., they can call attention to a poorly worded question)
Are data normal?
Examine both univariate (individual variables) and multivariate (combination of variables) normality
Ways to assess normality
- Skewness: Degree of symmetry of a distribution around the mean
- Kurtosis: Degree of peakedness of distribution
- When normal, value for both are equals to zero
- Kolmogorov-Smirnov statistic
Kolmogorov-Smirnov statistic
Tests the null hypothesis that the population is normally distributed
-Significance of this test indicates non-normal data
Normal distribution
A symmetrical, bell-shaped distribution having half the scores above the mean and half the scores below the mean
- Most of the scores are clustered near the middle of the continuum of observed scores
- Resembles bell shaped curve
Variability
The extent to which scores spread out around the mean
Range
A measure of variability that is computed by subtracting the smallest score from the largest score
Variance
A single number that represents the total amount of variation in a distribution
Standard Deviation
The standard deviation is the square root of the variance. It has important relations to the normal curve.
- Most commonly used measure of dispersion
- Approximately how far on the average a score is from the mean
Skewed Distribution
Most of the scores are clustered on one end of the continuum
- Positively skewed: scores cluster at the lower end of the continuum (higher than zero statistic)
- Negatively skewed: scores cluster at the higher end of the continuum (lower than zero statistic)
Kurtosis
Measure of the degree of peakedness of a distribution
Leptokurtosis
Distribution is too peaked with thin tall (higher than zero statistic)
Platykurtosis
Distribution is too flat with many cases in the tail(s) (lower than zero statistic)
Multimodal shapes
Scores tend to congregate around more than one point
Bimodal shapes
scores are clustered in two places