Distributions and Histograms Flashcards
Visually representing continuous variables
Histogram (predictor variable is cont.); column chart with bins and no categories; count # of values greater than min. up to but not including max.
Bar height
of observations in bin
Bar width
Range of values of the continuous variable
Patterns of distributions
Bell-shaped/normal, U-shaped, Right-skewed/positive, Left-skewed/negative
Bell-Shaped/Normal:
Symmetric/unimodal
Skewness value of 0
Many naturally occurring characteristics are normally distributed
U-Shaped:
Symmetric, bimodal (two distributions)
Right-skewed/Positive:
Long right tail
Skew=running into a max. or min. possible value; could have min. and no max. or opposite
Asymmetry
Left-skewed/Negative:
Long left tail
Skew=outliers
Mean is influenced by outliers
Asymmetry
Describing distributions with descriptive statistics
Central tendency, variability
Central Tendency
mean (average/normal), median (middle value/skewed), mode (most frequent value/bimodal)
Variability
Standard deviation (normal), interquartile range (skewed), range (high-low)
Standard Deviation
Shows how closely points cluster around the mean; “average” amount the points deviate, or differ, from the mean
Median
Found by calculating middle of minimum and maximum values then averaging those values; less sensitive to outliers or skew
IQR
Quartiles are values that cut the distribution in quarters; IQR= Q3-Q1
Box and Whisker Plot
Box plots represent the min, Q1, median, Q3, and max; outliers are black dots on whiskers