M2 - Tutorial Descriptive Statistics Flashcards
Arithmetic mean
durschschnitt aller werte
Median
- uneven n
- even n
- for uneven n the xmed is the value sin the middle of a sorted list
- for even n the xmed is the arithmetic mean of the two values in the middle
Mode
most frequent parameter value
p-quantil
e.g.
values which divide the n values into two parts, of which at least a fraction p of the data is less/equal to xp, and at least a fraction 1-p is greater/equal xp
10% quantil: mind 10% der daten sind kleiner/gleich x10% und 90% der daten sind größer/gleich x10%
variance s²
measures the spread of data around the mean
standard deviation
- if small
- if large
- advantage over variance
used to quantify the amount of variation of a set of data values
- if small, the data points are close to the mean
- if high, the data points are spread out over a wider range
- it is expressed in the same unit as the data (unlike variance)
advantage coefficient of variance
it is irrespective of scale –> appropriate for comparing different spreads
skewness & coeff. of skewness
- left
- right
- symmetric
- left-skewed: the bigger part of the distribution is concentrated on the right gm < 0
- right-skewed: the bigger part of the distribution is concentrated on the left gm > 0
- symmetric: right and left half are almost mirror-image
gm = 0
Kurtosis / Peakedness
- peaked
- flattened
- normal
- how sharp?
- peaked: leptokurtic distribution y > 0
- flattened: platykurtic y < 0
- normal: mesokurtic y = 0
Digression: Data types
- nominal: a few possible values
- ordinal: few ranked values
- interval: any value within a certain interval; no meaningful zero
- ratio: numbers, with meaningful zero denoting that there is no variable
creation of dummy variables
- why
- how
- why? if there is a binary variable (=1/0; yes/no), which is indicator for a continuous variable
- create dummy for every state (=1, set all others = 0)