Lecture 2 Flashcards
Measures of Central Tendency
Distribution
A way of describing the way a dataset looks
Measures of Central Tendency
Mean, Median and Mode. The one that is used depends on the distribution of the data
Another name for central tendency
The Average
Mean
AKA arithmetic mean, most common measure of a midpoint. Useful if data spread is fairly even (normally distributed). Sum of all values divided by total nº of values.
What is the symbol for the mean if we have data on a whole population?
μ (pronounced ‘mu’)
What is the symbol for the mean if we have data on a sample (from a larger population)?
𝒙̅ (pronounced ‘x bar’)
Why is the mean only useful when data is normally distributed?
It is impacted by outliers
Median
The point at which half of the values (for a given variable in a sample/population) lie below and half of the values lie above
Odd vs Even Number of Values
If even, median is the middle ordered value. If odd, median is the average (mean) of the two middle ordered values. Values must be in order to work out median!
What kind of measure is the median?
A resistant measure of the data’s centre. not affected by outliers. Used instead of mean if data is skewed
Median and Quartiles
-The 1st quartile has ¼ of the data below it
-The 3rd quartile has ¾ of the data below it
-The interquartile range (IQR) contains the middle ½ of the sample data – the data between the 1st and 3rd quartile
Box and Whisker Plots
Very top and bottom lines = maximum and minimum values (excluding outliers!). Middle line = median. 2nd and 4th lines = upper and lower quartiles
Mode
The most frequently occurring event/observation (data point). We can have modal, bimodal and multimodal data. Useful for nominal variables such as eye colour.