Lecture Week 3 Flashcards
6 reasons for conducting exploratory data analysis
- Check for data entry errors
- Obtain a descriptive analysis of the data
- Find patterns that are not obvious
- Analyse and deal with missing data
- Checking for outliers
- Checking assumptions
What is Central Tendency
Tendency for the values of a random variable to cluster round its mean, mode, or median.
Multiple Measures of Central Tendency
- Summary statistic that represents the center point value of a dataset
- Three most common measures of central tendency are the mean, median, and mode.
Multiple Measures of Variability
- Define how far away the data points tend to fall from the center.
- Range, Interquartile Range, Variance and Standard Deviation
Quantitative measures of Shape
- The distribution shape of quantitative data can be described as there is a logical order to the values, and the ‘low’ and ‘high’ end values on the x-axis of the histogram are able to be identified.
- Histograms
- Kurtosis
- Skewedness
Confidence Intervals
- For 95% confidence intervals, an average of 19 out of 20 contain the population parameter.
- Suppose you have a 95% confidence interval of [5 10] for the mean.
- You can be 95% confident that the population mean falls between 5 and 10
Normality & Sample Size
- Normality and Skewness divided SE Skewness are impacted by sample size
- If there is a really large sample, tests become hypersensitive
- Even trivial deviations from normality will violate the assumption of normality
- Can create a false positive
- Skewness doesn’t really change though
Monte Carlo Tests
- Simultated Studies that change the characteristics of the data show that even gross deviations from Normality don’t impact the statsistical significance of the tests.
- Most tests can withstand even gross deviations from Normality - they are called robust
*
Central Limit Theorem
- If you have a population with mean μ and standard deviation σ and large random samples
- The distribution of the sample means will be approximately normally distributed.
What to do if you decide that data is not Normal?
- No simple answer
- You could do a transformation
- Sometimes transforming the data can fix a problem of Normality
- If transforming the data successfully converts the data to normality you can be confident the data is in fact Normally Distributed
Transformation
Applying a simple mathematical operation to data to deal with violations of assumptions
Common Transformations
- Log 10
- Square Root
- Reciprocal
Log10 Transformation
- Base 10 Logarithm
- Formula in SPSS: Transform/Compute Variable/rename Target Variable/Numeric Expression: lg10(Variable)
Square Root Transformation
Formula in SPSS: Transform/Compute Variable/rename Target Variable/Numeric Expression: sqrt(variable)
Reciprocal Transformation
Formula in SPSS: Transform/Compute Variable/rename Target Variable/Numeric Expression: 1/Variable