Distributions and Histograms Flashcards
Visually representing continuous variables
Histogram (predictor variable is cont.); column chart with bins and no categories; count # of values greater than min. up to but not including max.
Bar height
of observations in bin
Bar width
Range of values of the continuous variable
Patterns of distributions
Bell-shaped/normal, U-shaped, Right-skewed/positive, Left-skewed/negative
Bell-Shaped/Normal:
Symmetric/unimodal
Skewness value of 0
Many naturally occurring characteristics are normally distributed
U-Shaped:
Symmetric, bimodal (two distributions)
Right-skewed/Positive:
Long right tail
Skew=running into a max. or min. possible value; could have min. and no max. or opposite
Asymmetry
Left-skewed/Negative:
Long left tail
Skew=outliers
Mean is influenced by outliers
Asymmetry
Describing distributions with descriptive statistics
Central tendency, variability
Central Tendency
mean (average/normal), median (middle value/skewed), mode (most frequent value/bimodal)
Variability
Standard deviation (normal), interquartile range (skewed), range (high-low)
Standard Deviation
Shows how closely points cluster around the mean; “average” amount the points deviate, or differ, from the mean
Median
Found by calculating middle of minimum and maximum values then averaging those values; less sensitive to outliers or skew
IQR
Quartiles are values that cut the distribution in quarters; IQR= Q3-Q1
Box and Whisker Plot
Box plots represent the min, Q1, median, Q3, and max; outliers are black dots on whiskers
Main Effects
1 outcome and certain # of predictors
Associated (related):
Outcome is diff. by levels of predictor
Independent (not related):
Outcome is same by levels of predictor
Interaction:
Effect of one predictor is different depending on level of other predictors
Additive:
Effect of one predictor is same regardless of level of other predictors
Effect Size
The magnitude (strength) of the association between predictor and outcome; based on difference between means and variability
Effect Size and Mean Differences
If variability is same, but one pair of means are farther apart then there is less overlap, bigger effect size, and easier to detect
Effect Size and Variability
If mean differences are the same, but one pair has smaller variability then there is less overlap, bigger effect size, and easier to detect
Ceiling effect/Floor effect
Runs into upper/lower limit
Converting continuous values into categories
Count frequency of each category and create column chart