descriptive statistics (w4) Flashcards
what things can be sued to describe data
histograms, central tendency, spread, shape, outliers, box plots
what are central tendencies
mode, median, mean
what are spreads of data
quantile/quartile/percentile
variance and standard deviation
z-score
what is the shape in terms of describing data
skewness, kurtosis
purpose of a histogram
to visualise how data are distributed
what is the mode, what types of variables can it be used for
most occurring answer (highest stack), can be multiple modes
all types of variables
what is the median, what types of variables can it be used for
a middle value dividing data into 2 groups with the same number (middle value)
only ordinal, interval, ratio (ordered variables)
what is the mean, what types of variables can it be used for
= ∑ coin value
number of coins
(sum of all value/total number)
only interval and ratio
which central tendency (mean, median, mode) would an outlier affect most
mean and it depends on actual values
why need spread of data
distributions can have same mean/median but one may be much more spread
how to calculate spread
divide data into sections containing same number of data
what are quantiles
cut off points diving equal sections of data, for N sections they are called N-quantiles (N-1 values)
what are quartiles
when there are 4 sections in total, they are called quartiles (1st-3rd), median is 2nd quartile
what are percentiles
when there are 100 sections in total, they are called percentiles (1st-99th), median is 50th percentile
what is the 2nd moment
how hard to spin data around mean
= ∑ [distance from mean]2 to each data point
/number of data points
= variance
what is standard deviation
square root of variance, the standard distance from the mean
what does mean +- SD show
where the centre is and how spread data points are around it
what is the z-score and what’s it for
given SD, distance can be describe as a ratio with respect to SD, which is the z-score
enables fair comparisons of deviations
who’s height is more deviated from the mean:
female: 5.3 +- 0.3ft male: 5.8 +- 0.4ft
mary is 5.6, dave is 6.1
mary: (5.6-5.3)/0.3 = 1
dave: (6.1-5.8)/0.4 = 0.75
mary’s height is more deviated from the mean compared to dave
what is skewness
measures degree of asymmetry
what is 3rd moment, how do you make it dimensionless
= ∑ [distance from mean]3 to each data point
/number of data points
divide it by SD^3
ie: skewness = 3rd moment/SD^3
what does: 0 and high skewness mean
0 = data symmetrically distributed
high = distribution highly symmetrical
what does + and - skewness mean
+ data skewed left
- data skewed right
what is kurtosis
the sharpness of graph