Descriptive Statistics Flashcards
Name 3 types of data
(CDC)
Categorical
Discrete
Continuous
Categorical
Binary or Nominal
Has two or more categories with no ordering to them.
E.g. Hair colour, Job title
Continuous
Ratio or Interval variables
Can take any fractional value
E.g. Reaction times
Discrete
Ordinal, Ratio, or Interval variables
Has a fixed value with a logical order
E.g. Shoe size, Score out of 10
Median Cons?
Ignores a lot of the data
Difficult to calculate without a computer
Can’t use this with NOMINAL data
Median Pros?
Insensitive to outliers
Often gives a real, meaningful data value
Useful for ordinal data, and skewed interval/ratio data
What are the 3 measures of central tendency?
Mean- sum of data points
Median- middle score in data set
Mode- most in a data set
Median:
The middle value in a dataset, or the mean of the middle two values can be calculated as:
Odd value datasets: (n+1)÷2
Even value datasets: Line up middle two values then÷2
What is the equation for calculating the mean?
Sum of individual data points
÷
sample size
Mean Pros?
Uses all of the data
Is most effective for normally distributed datasets
Mean Cons?
Sensitive to outliers
Values are not always meaningful (we cant get a score of 6.74 out of 10!).
Only meaningful for RATIO and INTERVAL data
Measures of spread:
Mode
no measures of spread
Measures of spread:
Median
‘distance-based’ measures such as range and interquartile range
Measures of spread:
Mean
‘centre-based’ measures of spread such as variance and standard deviation
Interquartile range IQR:
Pros and cons are identical to the median
highest score - lowest score
but ignores most extreme values
Lower quartile= median of lower half of the data
Upper quartile= median of upper half of the data
Interquartile range = Upper quartile-Lower quartile