Descriptive Statistics (L8) Flashcards
what are the 2 types of descriptive statistics?
- Central Tendency–> mean, median, mode
- Variability–> variance, SD, standard error, z score, range, interquartile range
what is central tendency?
the value representing the center of a distribution (mean, median, mode)
characteristics of a normal distrivution
*Bell-shaped (aka symmetry)
*Majority of scores lie around the center
of the distribution
*One peak (unimodal)
*As we get further away from center,
scores become less frequent
*Originally known as the Gaussian
distribution but was used so often it
became ‘normal’
*Normal Distribution is one of the most
fundamental and useful statistical tools
W H Y A R E N O R M A L
D I S T R I B U T I O N S ‘ N O R M A L ’ ?
*Normal distributions reflect life’s complexity
*The things we study in the real world are often complicated and
are the sum of many small factors
*When you sum many small things together, they look ‘gaussian’
or ‘normal’
*This is because when you take the average value of your
measure, each individual sample can be thought of as a small
fluctuation from the average
*When you start adding fluctuations, they cancel each other out
*The more fluctuations you have, the better chance that any
fluctuation will be cancelled by one in the opposite direction
what is the Central Limit Theorem
*Central Limit Theorem: Eventually, the most likely sum is one
where all fluctuations are cancelled out, and this would be a sum
of zero (relative to the mean)
what is the mode
*Score that most frequently occurs in the dataset
*Easy to spot = tallest bar
what is the median
middle score when scores are ranked in order of magnitude
comparing median vs mean
MEDIAN
*Relatively unaffected
by extreme values
*Relatively unaffected
by skewed distributions
MEAN
*Influenced by extreme
values
*Influenced by skewed
distributions
what is the range-how to calculate it
Take the largest score and subtract from it the smallest score
I N T E R Q U A R T I L E R A N G E
*IR addresses the concern by of influenced by
extreme values
*For the IR we cut off the top and bottom 25% of
the data and are left with the middle 50%
*Quartiles are the 3 values that sort the data into
4 equal parts
*It isn’t affected by extreme scores at either end
*The only problem – you lose a lot of data!
*Graphs referred to Boxplots or Box and Whiskers
how to measure variance
M E A S U R E V A R I A N C E B Y F I N D I N G T H E A V E R A G E
D I F F E R E N C E O F E A C H V A L U E F R O M T H E M E A N
Sum of Squares (SS)
*Square the values because of the
negative data points
*NOTE: It is a total value, not an average.
SS = 10
V = 2.5
Variance (V)
*We actually don’t calculate the average as
we normally would (10/5 = 2).
*We divide by N-1 (10/4 = 2.5
𝑥𝑥!−̅𝑥”SS
2(2 –4)24
5(5 –4)21
6(6 –4)24
4(4 –4)20
3(3 –4)21
z-score
A z-score is a standardized value that
tells you how far away a single value
is from the mean of a set of values