Week 4 Flashcards
what is frequency
how often a value appears in data
what does a histogram show
how data is distributed
what is the mode
the most common piece of data
can be used for all types of variables but mostly nominal and ordinal
what is the median
the middle value
cannot be used for nominal variables
what is the mean
mean = total value / number of data sets
can only be used for interval and ratio variables
how similar are median and mean
one extreme outlier can hugely affect the mean but not the median
what is spread
how wide the range of data is
what are quantiles, quartiles and percentiles
- quantile: the sections data is split into
- quartile: name for if there are 4 sections total
- percentile: name for if there are 100 sections total
what is variance
- 2nd moment
- (distance from mean)^2 to each data point / number of data points
what is standard deviation
the square root of the variance
what is the z-score
a ratio with respect to standard deviation
what is skewness
the degree is asymmetry
3rd moment
(distance from mean)^3 to each data point / number of data points
skewness = 3rd moment / SD^3
zero skewness means data are symmetrically distributed
what is kurtosis
sharpness of data
4th moment
(distance from mean)^4 to each data point / number of data points
kurtosis = 4th moment / SD^4
what are outliers
extreme values relative to the bulk of values in a data set
- based on zscore more than 3 or less than -3
what is a box plot
a plot summarising quartile-based statistics of a data set
includes
- location of quartiles
- range of data excluding outliers
- outliers detected by quartile