Descriptive Statistics Flashcards
Parameter
Describes a population → summary of real behaviour/data present in the population
Statistic
Describes a sample of data → we infer something about the population (parameters) from what we know about the sample (statistics)
Descriptive statistics
Summarise patterns within the sample
- Central tendency (location)
- Dispersion (spread)
Infererential statistics
Allow us to draw inferences about the population based on our sample
Discrete variable
- Scores can only take on certain values e.g. 5-point Likert-type rating scale
- Usually whole numbers but e.g. UK shoe sizes are also discrete: 7, 7 1/2, 8 etc
Continuous variable
- Scores can take any value
- Any level of precision
- E.g. age, time, weight etc
Types of data
- Nominal: labels/names/categories e.g. gender
- Ordinal: numbers ranked or ordered by a category e.g. order of finishing in a race
- Interval: measurements are made on a scale; differences between points on the scale are equal but there is no ‘natural’ zero point (it is arbitrary) e.g. temperature scale
- Ratio: same as interval data but there is a ‘natural’ zero point e.g. height/length, weight, time etc
Measures of central tendency
-
Mode: the most frequently occurring score
- Advantage: simple to calculate and easy to understand
- Disadvantage: easily unrepresentative
-
Median: middle score when scores ordered by size
- Advantage: relatively unaffected by untypical extreme scores
- Disadvantage: may not describe all scores in a data set
-
Mean: add together scores and divide by total number of scores
- Advantage: only measure which uses every single score
- Disadvantage: easily distorted by single, extreme scores (outliers)
Measures of dispersion
-
Range: how far apart the highest and lowest scores are
- Advantage: easy to calculate
- Disadvantage: affected by extreme scores
- Interquartile range (IQR): ‘trimmed’ range = 75th percentile minus the 25th percentile
-
Standard deviation: the average amount by which scores differ form the mean
- Standard deviation takes into account all values in the data set
- Square root of variance
Sum of squared errors (SS)
The total amount that data points deviate from the mean, squared.
Central limit theorem
- If we take lots of samples from a population and take the mean of each sample, these means will be nornally distributed
- The mean of all sample means will approximate the population mean
- Need sample size of at least 30
Standard error
The standard deviation of sampling means = the variability in sample means around the population mean