Lecture 3: Starting Points in Data Analysis Flashcards
What are the quartiles on a boxplot?
first quartile (Q1): divides the lowest 25% of the data form the highest 75% (25th percentile or lower quartile)
second quartile (Q2): divides the data in half. 50th percentile or median
third quartile (Q3): divides the highest 25% of the data form the lowest 75%. 75th percentile or upper quartile
What is the interquartile range (IQR)?
Q3-Q1
sometimes referred to as a middle 50%
measure of variability
median and IQR often reported for variables that are not normally distributed
What are outliers?
extreme observations in your variable of interest
How does SPSS identify outliers?
according to Tukey’s fences method
- values below Q1-(1.5IQR) or above Q3+(1.5IQR) –> these are marked with an O
- values below Q1-(3IQR) or above Q3+(3IQR) –> these are marked with an * (more extreme values)
Quantile-Quantile Plots (Q-Q Plots)
determines if a variable comes form a specified distribution
Stem and leaf plot
displays the frequency at which certain classes of values appear in the data
can be used to examine distribution of data as well as extreme values
What are the statistical tests for normality?
Shapiro-Wilk test
KS test
What is the Shapiro-Wilk test?
tests null hypothesis that data came from a normally distributed population
more accurate when the sample sizes are <2000
What is Kolmogorov-Smirnov test?
Goodness of fit test –> tests null hypothesis that a sample comes from a specified distribution
More accurate when sample sizes are large (n ≥ 2000)
What does it mean when you reject the null hypothesis in the test for normality?
It means that you fail the normality test and have a significant p value (<0.05) so you dont have data that came from a normally distributed population
What does it mean when you accept the null hypothesis or fail to reject it in the test for normality?
It means that you pass the normality test and have a non-significant p-value (>0.05) so you have data that comes from a normally distributed population.
What are some other parameters to assess normality?
Skewness and Kurtosis
What is skewness?
measure of asymmetry
- normal distribution has skewness of 0
What is kurtosis?
measure of tail density relative to a normal distribution
- normal distribution has kurtosis of 3
On SPSS, what is the ideal measurement of skewness and kurtosis for a normally distributed data?
skewness <1
kurtosis between 0-3
Platykurtic
kurtosis value <3
no tails
Mesokurtic
average scale density that you would expect to see in a normal distribution
Laptokurtic
kurtosis value >3
heavy tailed distribution
smaller peak
Skewness <0
negatively skewed
to the left
Skewness >0
positively skewed
to the right
Skewness = 0
normally distributed
When do we do a data transformation?
when data is not normally distributed, common to apply a transformation to attempt to improve the normality
How to transform a right skewed data? (in order of severity)
- Reciprocal transformation: t = 1/x
- Log transformation: t = log10(x)
- Square root transformation: t = sqrt(x)
How to transform a left skewed data? (in order of severity)
Cubic transformation: t = x (cubed)
Square transformation: t = x (sqrd)
When can’t we use reciprocal and log transformation?
on 0 value data (add a small constant to every value to fix it)
How do we energy adjust macronutrients?
express intake as proportion of total energy intake (% calories from total fat)
How do we energy adjust micronutrients?
intake per 1000kcal
How do we energy adjust food groups?
intake per 1000kcal
What is hypothesis testing?
method of determining if results from your study are meaningful
how likely is it that the results arose by chance
What is hypothesis?
educated guess about your variables of interest
What is null hypothesis Ho vs alternate hypothesis H1?
Null: there is NO statistically significant difference between the population parameter and the sample statistic being compared
Alternate: statistically significant difference exists between the population parameter and the sample statistic being compared
What is a Type 1 error (a)?
the rejection of a true null hypothesis (also known as a “false positive” finding)
What is a Type 2 error (B)?
the failure to reject a false null hypothesis (also known as a “false negative” finding)
How is rejection of a false null hypothesis represented?
1-B (power of the test) related to the sample size
How is acceptance of a true hypothesis represented?
1-a (confidence interval)
What is the p value?
probability value
- based off the assumption that H0 is true
- gives the probability that results arose simply by chance.
What is alpha?
the significance level: probability of rejecting the null hypothesis when the null is true
1% p<0.01 (2.58)
5% p<0.05 (1.96)
10% p<0.10 (1.65)
One-tailed test vs Two-tailed test
one tailed test
is testing the possibility of a relationship in one direction only (comparing means) (u>u0 or u