Descriptive Statistics Flashcards
What are descriptive statistics?
Describe the data you have
What is the population?
Entire group of people you are interested in
What is a sample?
Subset of population
Usually represented with n (also known as sample size)
What is categorical data?
Usually nominal or ordinal
Two or more categories with no ordering to them
What are examples of categorical data?
Hair colour
Marital status
What is discrete data?
Usually ordinal, ratio or interval variables
Fixed value with logical order
What are examples of discrete data?
Shoe size
Score out of 10
What is continuous data?
Usually ratio or interval variables
Can take any fractional value
What are examples of continuous data?
Reaction times
How can categorical data be presented in a frequency distribution?
As its raw frequency or as a percentage frequency
How can discrete data be presented in a frequency distribution?
As raw frequency or percentage
As cumulative frequency or percentage
If loads of values, use frequency ranges instead (grouped in meaningful way)
What are measures of central tendency?
Sometimes want to condense entire frequency distribution into single number
Where might want to calculate tendency of data
What are three types of measures of central tendency?
Mode
Median
Mean
What is the mode?
Score occurring most often in dataset
Sometimes takes more than one value (bimodal and multimodal distributions)
What data is the mode used for?
Nominal data
What is the median?
Middle score in dataset
Middle value in dataset or mean of middle two values
How do you work out the median for odd value datasets?
(n+1) / 2
How do you work out the median for even value datasets?
(middle two values) / 2
What are the pros of the median?
Insensitive to outliers
Often gives real, meaningful data value
What data is the median used for?
Ordinal data
Skewed interval/ratio data
What are the cons of the median?
Ignores a lot of data
Difficult to calculate without a computer
Can’t use with nominal data
What is the mean?
Sum of data points divided by number of data points
What are the pros of the mean?
Uses all of the data
Most effective for normally distribution datasets
What are the cons of the mean?
Sensitive to outliers
Values not always meaningful
Only meaningful for ratio and interval data
What measure of spread is used for the mode?
None
What measure of spread is used for the median?
“Distance based” measures
Range, IQR
What measures of spread are used for the mean?
“Centre-based” measures
Variance, standard deviation
What is the interquartile range?
Similar to range but ignores most extreme values
Range of scores within middle 50% of scores
UQ - LQ
What is the lower quartile?
Median of lower half of data
What is the upper quartile?
Median of upper half of data
What are the pros of the IQR?
Insensitive to outliers
Often gives real, meaningful data value
Useful for ordinal data and skewed interval/ratio data
What are the cons of the IQR?
Ignores lot of data
Difficult to calculate without a computer
Can’t use with nominal data
What is the deviance?
Each score subtracted from mean
Could see deviance of “0”
How far score is away from the mean
What is the sum of squared errors (SS)?
Deviance is squared and all deviances are summed
More data points = bigger SS
What is the variance?
“Average” of sum of squared errors
What are the pros of the variance?
Uses all data
Forms basis of several other tests/statistics
What are the cons of the variance?
Requires normal distribution
Sensitive to outliers
Units not sensitive
What is the standard deviation?
Measure of spread that’s equal to the unit of measurement of DV
Square root of the variance
Can measure s of population or an estimated s of population based on sample
Allows us to get unbiased estimate of population’s s if only have access to sample of data