Descriptive Statistics Flashcards
Name 3 types of data
(CDC)
Categorical
Discrete
Continuous
Categorical
Binary or Nominal
Has two or more categories with no ordering to them.
E.g. Hair colour, Job title
Continuous
Ratio or Interval variables
Can take any fractional value
E.g. Reaction times
Discrete
Ordinal, Ratio, or Interval variables
Has a fixed value with a logical order
E.g. Shoe size, Score out of 10
Median Cons?
Ignores a lot of the data
Difficult to calculate without a computer
Can’t use this with NOMINAL data
Median Pros?
Insensitive to outliers
Often gives a real, meaningful data value
Useful for ordinal data, and skewed interval/ratio data
What are the 3 measures of central tendency?
Mean- sum of data points
Median- middle score in data set
Mode- most in a data set
Median:
The middle value in a dataset, or the mean of the middle two values can be calculated as:
Odd value datasets: (n+1)÷2
Even value datasets: Line up middle two values then÷2
What is the equation for calculating the mean?
Sum of individual data points
÷
sample size
Mean Pros?
Uses all of the data
Is most effective for normally distributed datasets
Mean Cons?
Sensitive to outliers
Values are not always meaningful (we cant get a score of 6.74 out of 10!).
Only meaningful for RATIO and INTERVAL data
Measures of spread:
Mode
no measures of spread
Measures of spread:
Median
‘distance-based’ measures such as range and interquartile range
Measures of spread:
Mean
‘centre-based’ measures of spread such as variance and standard deviation
Interquartile range IQR:
Pros and cons are identical to the median
highest score - lowest score
but ignores most extreme values
Lower quartile= median of lower half of the data
Upper quartile= median of upper half of the data
Interquartile range = Upper quartile-Lower quartile
IQR example:
Lower quartile =14th/15th value
=5 (Five times a day)
Upper quartile = 43rd/44th value
= 7 (Seven times a day)
IQR = 7-5 = 2
Variance Pros?
Uses all of the data
Forms the basis of several other tests
Deviance-
take each score and subtract it from the mean
Variance Cons?
Requires a normal distribution
Sensitive to outliers
Units are not sensible (can we explain variance as scores2?)
Sum of squared errors-
total the squared errors
Squared errors-
take each deviance score and square it
Variance-
average squared errors
What is measure of spread that is equal to the unit of measurement of the dependent variable.
Standard Deviation (SD)
How is SD calculated?
Calculated using the square root of variance.
What does Ordinal data mostly use?
Median and IQR
What does Categorical data mostly use?
Mode
lmao you should literally know all of this from a level!
FRFR