Lecture 3: Starting Points in Data Analysis Flashcards
What are the quartiles on a boxplot?
first quartile (Q1): divides the lowest 25% of the data form the highest 75% (25th percentile or lower quartile)
second quartile (Q2): divides the data in half. 50th percentile or median
third quartile (Q3): divides the highest 25% of the data form the lowest 75%. 75th percentile or upper quartile
What is the interquartile range (IQR)?
Q3-Q1
sometimes referred to as a middle 50%
measure of variability
median and IQR often reported for variables that are not normally distributed
What are outliers?
extreme observations in your variable of interest
How does SPSS identify outliers?
according to Tukey’s fences method
- values below Q1-(1.5IQR) or above Q3+(1.5IQR) –> these are marked with an O
- values below Q1-(3IQR) or above Q3+(3IQR) –> these are marked with an * (more extreme values)
Quantile-Quantile Plots (Q-Q Plots)
determines if a variable comes form a specified distribution
Stem and leaf plot
displays the frequency at which certain classes of values appear in the data
can be used to examine distribution of data as well as extreme values
What are the statistical tests for normality?
Shapiro-Wilk test
KS test
What is the Shapiro-Wilk test?
tests null hypothesis that data came from a normally distributed population
more accurate when the sample sizes are <2000
What is Kolmogorov-Smirnov test?
Goodness of fit test –> tests null hypothesis that a sample comes from a specified distribution
More accurate when sample sizes are large (n ≥ 2000)
What does it mean when you reject the null hypothesis in the test for normality?
It means that you fail the normality test and have a significant p value (<0.05) so you dont have data that came from a normally distributed population
What does it mean when you accept the null hypothesis or fail to reject it in the test for normality?
It means that you pass the normality test and have a non-significant p-value (>0.05) so you have data that comes from a normally distributed population.
What are some other parameters to assess normality?
Skewness and Kurtosis
What is skewness?
measure of asymmetry
- normal distribution has skewness of 0
What is kurtosis?
measure of tail density relative to a normal distribution
- normal distribution has kurtosis of 3
On SPSS, what is the ideal measurement of skewness and kurtosis for a normally distributed data?
skewness <1
kurtosis between 0-3
Platykurtic
kurtosis value <3
no tails
Mesokurtic
average scale density that you would expect to see in a normal distribution
Laptokurtic
kurtosis value >3
heavy tailed distribution
smaller peak
Skewness <0
negatively skewed
to the left
Skewness >0
positively skewed
to the right
Skewness = 0
normally distributed
When do we do a data transformation?
when data is not normally distributed, common to apply a transformation to attempt to improve the normality
How to transform a right skewed data? (in order of severity)
- Reciprocal transformation: t = 1/x
- Log transformation: t = log10(x)
- Square root transformation: t = sqrt(x)
How to transform a left skewed data? (in order of severity)
Cubic transformation: t = x (cubed)
Square transformation: t = x (sqrd)