week 3 -- Summarizing data & Comparing distributions Flashcards
descriptive vs inferential statistics
d = what we can do to DESCRIBE the population i = what we can do to MAKE A GENERALIZED STATEMENT about the entire population based on the observed sample
shape (descriptive)
symmetry?
modes (peaks)?
skew?
extreme values?
center (descriptive)
MODE
MEDIAN (splits dataset down the middle – robust for extreme values)
MEAN “balances” the data (i.e., takes into account frequency), can be misleading with datasets with extreme values
spread (descriptive)
RANGE (max-min)
INTERQUARTILE range (middle half of data
Median separates halves of the data
5-number summary
Q1-Q3 = middle half of data max = unusual data point? or just highest point in data?
outlier vs extreme value
“outlier” is a DEFINED term (in terms of data depiction), “extreme value” is a general term
If you draw a card at random from a well shuffled
deck, is getting an ace independent of the suit?
Explain.
Yes. There is the same number of aces in each suit, so no matter which suit you draw, the
probability of getting an ace does not change.
When are descriptive stats the same as inferential?
When population = sample (known as a census)
What does a bimodal pattern reflect?
e.g., two separate populations being lumped together – recheck trials/participants (is there anything that can explain the data?)
Draw a boxplot
1) Draw small lines at the median, IQ1 and IQ3 – make a box.
2) “Fence in the data” at 1.5xIQR above and below the IQs (just for reference, not part of boxplot)
3) draw “whisker” to the most extreme values found within the fences
4) add outliers outside the fences
What do boxplots tell us?
They help us compare the distribution of groups or categories we wish to compare
small IQR shows consistent performance
asymmetrical whiskers or box shows skew
outliers always derserve our attention! examine them in context of the data – is it an error or just an extreme value?
What to compare between boxplots
shapes (symmetric or skewed)
medians (which is higher? any pattern?)
IQRs (which group is more spread out? any pattern to IQR change?
outliers)