Lecture 3: Starting Points in Data Analysis Flashcards
What are the quartiles on a boxplot?
first quartile (Q1): divides the lowest 25% of the data form the highest 75% (25th percentile or lower quartile)
second quartile (Q2): divides the data in half. 50th percentile or median
third quartile (Q3): divides the highest 25% of the data form the lowest 75%. 75th percentile or upper quartile
What is the interquartile range (IQR)?
Q3-Q1
sometimes referred to as a middle 50%
measure of variability
median and IQR often reported for variables that are not normally distributed
What are outliers?
extreme observations in your variable of interest
How does SPSS identify outliers?
according to Tukey’s fences method
- values below Q1-(1.5IQR) or above Q3+(1.5IQR) –> these are marked with an O
- values below Q1-(3IQR) or above Q3+(3IQR) –> these are marked with an * (more extreme values)
Quantile-Quantile Plots (Q-Q Plots)
determines if a variable comes form a specified distribution
Stem and leaf plot
displays the frequency at which certain classes of values appear in the data
can be used to examine distribution of data as well as extreme values
What are the statistical tests for normality?
Shapiro-Wilk test
KS test
What is the Shapiro-Wilk test?
tests null hypothesis that data came from a normally distributed population
more accurate when the sample sizes are <2000
What is Kolmogorov-Smirnov test?
Goodness of fit test –> tests null hypothesis that a sample comes from a specified distribution
More accurate when sample sizes are large (n ≥ 2000)
What does it mean when you reject the null hypothesis in the test for normality?
It means that you fail the normality test and have a significant p value (<0.05) so you dont have data that came from a normally distributed population
What does it mean when you accept the null hypothesis or fail to reject it in the test for normality?
It means that you pass the normality test and have a non-significant p-value (>0.05) so you have data that comes from a normally distributed population.
What are some other parameters to assess normality?
Skewness and Kurtosis
What is skewness?
measure of asymmetry
- normal distribution has skewness of 0
What is kurtosis?
measure of tail density relative to a normal distribution
- normal distribution has kurtosis of 3
On SPSS, what is the ideal measurement of skewness and kurtosis for a normally distributed data?
skewness <1
kurtosis between 0-3