Lecture Week 3 Flashcards
6 reasons for conducting exploratory data analysis
- Check for data entry errors
- Obtain a descriptive analysis of the data
- Find patterns that are not obvious
- Analyse and deal with missing data
- Checking for outliers
- Checking assumptions
What is Central Tendency
Tendency for the values of a random variable to cluster round its mean, mode, or median.
Multiple Measures of Central Tendency
- Summary statistic that represents the center point value of a dataset
- Three most common measures of central tendency are the mean, median, and mode.
Multiple Measures of Variability
- Define how far away the data points tend to fall from the center.
- Range, Interquartile Range, Variance and Standard Deviation
Quantitative measures of Shape
- The distribution shape of quantitative data can be described as there is a logical order to the values, and the ‘low’ and ‘high’ end values on the x-axis of the histogram are able to be identified.
- Histograms
- Kurtosis
- Skewedness
Confidence Intervals
- For 95% confidence intervals, an average of 19 out of 20 contain the population parameter.
- Suppose you have a 95% confidence interval of [5 10] for the mean.
- You can be 95% confident that the population mean falls between 5 and 10
Normality & Sample Size
- Normality and Skewness divided SE Skewness are impacted by sample size
- If there is a really large sample, tests become hypersensitive
- Even trivial deviations from normality will violate the assumption of normality
- Can create a false positive
- Skewness doesn’t really change though
Monte Carlo Tests
- Simultated Studies that change the characteristics of the data show that even gross deviations from Normality don’t impact the statsistical significance of the tests.
- Most tests can withstand even gross deviations from Normality - they are called robust
*
Central Limit Theorem
- If you have a population with mean μ and standard deviation σ and large random samples
- The distribution of the sample means will be approximately normally distributed.
What to do if you decide that data is not Normal?
- No simple answer
- You could do a transformation
- Sometimes transforming the data can fix a problem of Normality
- If transforming the data successfully converts the data to normality you can be confident the data is in fact Normally Distributed
Transformation
Applying a simple mathematical operation to data to deal with violations of assumptions
Common Transformations
- Log 10
- Square Root
- Reciprocal
Log10 Transformation
- Base 10 Logarithm
- Formula in SPSS: Transform/Compute Variable/rename Target Variable/Numeric Expression: lg10(Variable)
Square Root Transformation
Formula in SPSS: Transform/Compute Variable/rename Target Variable/Numeric Expression: sqrt(variable)
Reciprocal Transformation
Formula in SPSS: Transform/Compute Variable/rename Target Variable/Numeric Expression: 1/Variable
Homogeneity of Variance
- This is problematic when we have an unbalanced sample
- Tested Using Levene’s Test
- Assumption underlying both t tests and F tests
- Population variances of two or more samples are considered equal.
- Corrected using one Transformation called a Power Transformation
Levene’s Test
- Tests Homogeneity of Variance
- Must include a grouping variable - that is a variable which can be placed into groups (such as gender)
- Formula in SPSS: Analyse/Descrfiptive Statistics/Explore/Dependent List: variable/Factor List: grouping variable/Plots/Spread vs Level with Levene Test/Power Estimation
- In output look at Test of Homogeneity of Variance and then look at Based on Mean
- And Standard Deviation
Power Transformation
- Used to transform data when the assumption of Homogeneity of Variance has been violated
- This is raising figures to a Power Value sauch as squared or cubed
- In SPSS Transform/Compute Variable/Rename: Target Variable/Numeric Expression: state ** variable
Spread vs Level Plot
- Plot itself is kinda pointless
- In the fine print it says: Power for Transformation
- This is the number to use when doing a Power Transformation to fix a problem with Levene’s Tests
General comments about transformations
- They are not a magic bullet
- They don’t cope with zeros (there needs to be a constant such as adding 10 to every score)
- They are unpredictable and can affect Normality and Homogeneity of Variance even if you weren’t planning to.
- Some data is “untransformable”
- Only use transformed figures to fix statistical tests
- Only report from the true data
Acceptable Skewness level to acheive Normality Assumption
Skewness statistic divided by std error equals either >+2 or <-2