Descriptive Statistics Flashcards
Nominal data
- No quantitative value
- Categorised
- No inherent order
- Mode
(e. g. Male or female, eye colour)
Ordinal
- Ranked scale
- Non-numeric concepts
- Difference between values not know
- Can’t define mean
- Central tendency = median or mode
(e. g. level of satisfaction - ‘very happy’, ‘happy’ etc.)
Interval
- Known differences between values
- No true zero
(e. g. temperature)
Ratio
- Order
- Known differences between values
- True zero
(e. g. height/weight)
Converting between types of data
- Divide numerical data into categories (e.g. small, medium, large)
- Or use interval/ratio data to rank individuals
What are the three types of descriptive statistics?
- Central tendencies
- Dispersion
- Skewness and kurtosis
Why are descriptive statistics important?
- First step in analysis
- Provide an overview of characteristics of data
- Summarise and describe a sample or population
Central tendencies
- Grouping around the middle value
- Mean, median, mode
Dispersion
- Measure of variation
- Range, interquartile range, standard deviation
Skewness and kurtosis
Describe the shape of distribution
Describe mean
- Sum of values divided by number of observations
- Interval or ratio
- Good for large data set with expected normal distribution
What are the advantages/disadvantages of using mean?
+ All values are considered
- Distorted by extremes
Describe median
- Values are ranked by magnitude
- Median is the middle value (or half way between two middle values)
- Ordinal, interval, ratio
- Most useful when using symmetrical data
What are the advantages/disadvantages of using median?
+ Not influenced by extremes
- Widely differing data may have the same median
Describe mode
- The most frequent value
- Nominal, ordinal
What are the advantages/disadvantages of using mode?
+ Useful to identify ‘typical figure’
- Not useful where no values reoccur (usually with interval/ratio)
What are measures of dispersion?
- Relate to the spread of values
- Compliment central tendencies in providing a more complete descriptive summary
Describe range
Difference between the smallest and largest values
What are the advantages/disadvantages of range?
+ Simple
+ Indicates degree of spread
- least informative - depends on extreme values
Describe inter-quartile range
- Difference between the highest quarter and lowest quarter of values when ranked
- Measures spread about the median
What do small/large interquartile ranges indicate?
Small - values are clustered around the median
Large - greater degree of spread
What are the advantages/disadvantages of IQ range?
+ Not influenced by extremes - 50% of data used
+ Anomalies are not considered
Describe standard deviation
- Dispersion around the mean
- Considers all values
- Square root of variance
What are the advantages/disadvantages of standard deviation?
+ Most reliable index of dispersion
+ Square root gives positive value - other mathematical uses
+ Adds up total deviation from mean across the whole set
Describe variance
- Coefficient of variation expresses variability of a percentage of the mean
- Dispersion relative to size of observation
What are the advantages/disadvantages of variance
- Can be used to compare variables in different units of measurement
Describe skewness
- Indicator of distribution around the central value
- Negative - skew to the right (mean < median)
- Positive - skew to the left (mean> median)
What is a normal curve?
A theoretical distribution representing a symmetrically distributed data set
Describe kurtosis
The extent to which a frequency distribution is peaked or flat
What are the three types of kurtosis
- Leptokurtic (+) – tall, narrow
- Platykurtic (-) – lower, wider
- Mesokurtic (0) – normal curve