descriptive statistics (w4) Flashcards
what things can be sued to describe data
histograms, central tendency, spread, shape, outliers, box plots
what are central tendencies
mode, median, mean
what are spreads of data
quantile/quartile/percentile
variance and standard deviation
z-score
what is the shape in terms of describing data
skewness, kurtosis
purpose of a histogram
to visualise how data are distributed
what is the mode, what types of variables can it be used for
most occurring answer (highest stack), can be multiple modes
all types of variables
what is the median, what types of variables can it be used for
a middle value dividing data into 2 groups with the same number (middle value)
only ordinal, interval, ratio (ordered variables)
what is the mean, what types of variables can it be used for
= ∑ coin value
number of coins
(sum of all value/total number)
only interval and ratio
which central tendency (mean, median, mode) would an outlier affect most
mean and it depends on actual values
why need spread of data
distributions can have same mean/median but one may be much more spread
how to calculate spread
divide data into sections containing same number of data
what are quantiles
cut off points diving equal sections of data, for N sections they are called N-quantiles (N-1 values)
what are quartiles
when there are 4 sections in total, they are called quartiles (1st-3rd), median is 2nd quartile
what are percentiles
when there are 100 sections in total, they are called percentiles (1st-99th), median is 50th percentile
what is the 2nd moment
how hard to spin data around mean
= ∑ [distance from mean]2 to each data point
/number of data points
= variance