Quiz #2 Flashcards
Standard deviations on a normal curve
Standard deviations on a normal curve: the 68-95-99.7 rule
+1 SD of the mean → 68.27% of population
+2 SD → 95.45%
+3 SD → 99.73%
Quartile divisions and how they can be displayed
1st quartile divides lowest 25% from highest 75%
25th percentile = lowest quartile
2nd quartile divides data in half
50th percentile = median
3rd quartile divides highest 25% from lowest 75%
75th percentile = upper quartile
Can be used with non-normally distributed data
Displayed as a boxplot
Boxplot anatomy
Interquartile range (IQR) = Q3 - Q1 or middle 50%
- Measure of variability and reported for non-normally distributed data (can’t use variance or standard deviation)
Whiskers = 1.5x IQR
Tukey’s fences - method used by SPSS to identify outliers
1) Below Q1 - 1.5xIQR or above Q3
+ 1.5xIQR marked with open circle
2) Beyond Q1 - 3xIQR or beyond Q3 + 3xIQR marked with star –> Extreme values considered more broad determinant of outliers
Requires justification for outlier removal
Q-Q Plot what it does, y and x axis
Quantile-Quantile (Q-Q) Plot: compares data to a theoretical standard distribution to determine normality
- Dots show how far from normal distribution
- Small tails = low deviation
Y axis = expected normal, x axis = observed value
Can also be detrended - remove trend to just visualize differences in value
Stem and Leaf Plot purpose, organization and settings
Stem and leaf plot: displays frequency at which certain classes of values appear (like an inverted histogram)
Organization: Frequency, Stem = first digit(s), Leaf = last digit(s)
Settings:
Width = the magnitude of the stem included
Width of 10 = 104 → 10.4, 50 = 5.0, 5 = 0.5
Each leaf = # cases (each number listed represents how many cases of that second digit
- Can be used to examine distribution and extreme values
Tests for normality (no details)
Shapiro-Wilk Test
Kolmogorov-Smirnov Test
Skewness
Kurtosis
Shapiro-Wilk Test
Tests H0 that population data is normally distributed
More accurate for n < 2000
p > 0.05 → data is normally distributed (p < 0.05 suggests low probability that data is normally distributed)
Kolmogorov-Smirnov Test
Goodness of fit test or tests H0 that sample comes from population with a specified distribution (comparative distribution)
Best for n ≥ 2000
SPSS will choose between this and Shapiro-Wilk based on n
Skewness vs. kurtosis
Skewness: measure of asymmetry
Normal distribution skewness = 0 (<1 in SPSS output)
Kurtosis: measure of tail density relative to normal distribution
Normal distribution kurtosis = 3 (0-3 in SPSS output)
Light tail >3 = leptokurtic
Heavy tail <3 = platykurtic
Tail = 3 = mesokurtic
Data transformation for positively skewed data
Reciprocal: t = 1/x (severe)
Log transformation: t = log10(x) (moderate)
Square root transformation: t = sqrt(x) (light)
Data transformation for negatively skewed data
Cubic: t = x^3 (severe)
Squared: t = x^2 (less severe)
Data transformation for dietary intake values
When adjusting for dietary intake values → helpful to adjust variable for total caloric intake to improve normality of data AKA nutrient density method
Macronutrients - express intake as % of total energy (ex. % kcal from total fat)
Micronutrients - intake per 1000 kcal
Food groups - intake per 1000 kcal
If H null is true but we say it is false?
Type I error (alpha)
If H null is false but we say it is true?
Type II error (beta)
p-value definition
alpha definition
probability that the results arose by chance and assumes that H0 is TRUE
alpha = significance level: probability of making a type I error (incorrectly rejecting H0)
The smaller the value, the more “unusual” the results