Word Doc Week 2 Flashcards
What should I do before conducting any data analysis?
- “Screen & Clean” Data
- SPSS can not tell if data is ridiculous
- Only the researcher knows this so I need to make sure I screen the data for errors
Six reasons for conducting exploratory data analysis
- Checking for data entry errors
- Obtaining a thorough descriptive analysis of your data.
- Examining patterns that are not otherwise obvious
- Analysing and dealing with missing data
- Checking for outliers
- Checking assumptions
EXPLORE command in SPSS
- The explore command is SPSS covers all bases.
Who is John Tukey
- A very practical statistician
- Language and techniques of EDA developed by him
What does Screening and Cleaning involve
- Computing new variables from existing ones
- Recoding variables
- Dealing with missing data
Checking Data Entry Errors
- Use the frequencies command to check for data entry errors in categorical/nominal variables
- Use the outliers option in the explore command to check for data entry erros in continuous/scale variables
Options for Dealing with Data Entry Errors
- Remove data
- Make ‘educated guesses’ about what was intended
Obtaining a thorough descriptive analysis of data
- The explore command provides more descriptive stats than any other procedure
- Multiple measures of central tendency
- Multiple measures of variability
- Quantitative measures of shape
- Confidence intervals
- Percentiles
Examining Patterns that are not otherwise obvious
- Stem and Leaf Plots
- Box and Whisker Plots
Analysing and dealing with missing data
Do you leave it out or do you substitute in the mean?
Outliers
- Are they legitimate, or are they error?
- Do you keep them? Do you discard them?
- Balancing act, as with so much in data analysis
Assumptions
- All parametric procedures (e.g., t-tests, ANOVA’s, correlation) operate under certain assumptions
- Main two:
- Normality
- Assumes that your data comes from population that is normally distributed
- Homogeneity of variance
- Assumes that, if your data is to be divided into groups, the level of variability in the groups will be approximately equal (i.e., not significantly different).
- Normality
Normality is tested in four ways:
- Visual inspection of histograms and stem and leaf plots
- Visual inspection of normality and detrended normality plots
- Normality tests
- Skewness divided by SE skewness
Subjective Normality Tests
- Visual inspection of histograms and stem and leaf plots
- Visual inspection of normality and detrended normality plots
- not influenced by sample size
Objective Normality Tests
- Normality tests
- Skewness divided by SE skewness
- Objective but influenced by sample size
- With a large sample, even trivial deviations from normality will indicate a violation of the assumption
The least important assumption
- Normality
- Statistical tests are very robust to even gross deviations from normality, particular when you have a large sample
Why is normality the LEAST important assumption?
- Statistical tests are based on sampling distributions
- Sample distrbutions will always be normal regarles of the underlying population distribution
- they will get more normal as the sample size increases
When is normality a problem?
- When the deviation is strong and the sample is small
Tests 3 and 4 are sensitive to sample size
- Ironical
- Indicate a violation of normality with a large sample
- Even if deviation is trivial
- Normality is rarely a problem when the sample size is large
Transformation
- Applying a simple mathematical operation to the data to deal with violations of assumptions
Common Transformations
- Log 10
- Square root
- Reciprocal
Homogeneity of variance assumption
- More serious particularly when the sample is unbalanced
- tested using dedicated test: Levene’s Test
- Dealt with by power transformation
Levene’s Test
- Used to test if k samples have equal variances.
- Equal variances across samples is called homogeneity of variance.
- Some statistical tests, for example the analysis of variance, assume that variances are equal across groups or samples.
- The Levene test can be used to verify that assumption.
Some general comments about transformations
- They are no “magic bullet”.
- They can’t cope with zeroes (need to add a constant)
- They are unpredictable (e.g., a transformation designed to deal with normality can fix up homogeneity of variance and vice versa)
- Some data is “untransformable”
Recoding Data:
- Often need to recode for a whole host of reasons:
- Reducing numbers of groups
- Reverse scoring