Week 4 Flashcards
Frequency definition
How often a value appears in data
Histogram definition
Visualizes how data are distributed
Mode
Highest value or number in data
Median
The middle value dividing data into two groups with the same number
Mean
Sum of data value / frequency
(1st moment)
Spread
Is the difference between highest value and lowest value in data set
Quantiles
Quantiles are locations of sections divided by the same count of data points.
There can be an N-number of sections
Quartiles
When there are 4 sections in total, they are called quartiles
Median is 2nd quartile
Percentiles
When there are 100 sections total, they are called quartiles.
Median is 50th percentile
Variance
Sum of (distance from mean squared) to each data point / number of data points
2nd moment
Standard Deviation
Standard deviation (SD) is the square root of variance
Z Score
Z-score enables fair comparisons of deviations
Z-score = (Value -Mean) / SD
Higher the Z score, the greater the value is deviated from the mean
Outlier if z-score is more than 3 or -3
Skewness
Skewness measures the degree of asymmetry
3rd moment
Sum of (distance of mean cubed) to each data point / number of data points
Positive Skewness shifted left
Negative skewness shifted right
Kurtosis
Kurtosis measures the sharpness
4th moment
Sum of (distance from mean 4th power) to each data point / number of data points
Kurtosis is always positive by definition, but normally we subtract 3. This is known as excess kurtosis
Outliers
Extreme values relative to the bulk of values in a data set
Outlier if z-score is more than 3 or -3
Outlier is IQR is 1.5 greater than 3rd quartile, or 1.5 smaller than 2nd quartile