Topic 2 Flashcards
What is the purpose of descriptive statistics?
To describe or summarise the overall pattern of data
How do you describe numerical data?
The three S’s - shape, centre and spread (plus outliers)
How do you describe categorical data?
Table of frequencies or proportions
What is a symmetrical shape?
Right and left side mirrored, can also be bell-shaped
What is a shape that is skewed to the left?
Left side extends further out than the right side
What is a shape that is skewed to the right?
Right side extends further out than the left side
What is a symmetrical, bimodal shape?
Symmetrical with two peaks
What is a symmetrical, uniform shape?
Symmetrical and flat
What is an outlier? What may be the cause of them?
Observations that deviate from the overall pattern of distribution. They may be caused by natural variation or measurement error.
What are numerical summaries for centre or location? (3)
Mode, median, mean
What are numerical summaries for spread? (3)
Range, inter-quartile range (IQR), standard deviation
What is mode?
The most common value or peak of data
What is median?
The middle; the value that divides an ordered data set into two equal halves
For what types of variables would you find the median?
Ordinal, discrete and continuous
What is mean?
The average of the data, found by adding all values and dividing by the number of cases
What does the ‘x bar’ symbol represent?
Mean
Is mean or median resistant to outliers/skewness and why?
Median, because it is always the middle. Mean can be more affected by outliers.
Mean ? median in symmetrical data?
Mean = median
Mean ? median in skewed left data?
Mean < median
Mean ? median in skewed right data?
Mean > median
What is the ‘range’ of data?
The difference between the largest and smallest values in the data set
What are the first, second and third quartiles?
Q1 - 25% of data below Q1
Q2 - 50% of data below Q2 - aka the median
Q3 - 75% of data below Q3
How do you calculate quartiles? (4 steps)
- Arrange data from lowest to highest
- Calculate the median (M)
- Calculate Q1 - median of the first half of data (excluding M)
- Calculate Q3 - median of the second half of the data (excluding M)
How do you find the interquartile range (IQR)?
IQR = Q3-Q1
What is the 1.5IQR rule used for?
A criteria used to identify outliers
How do you find the lower threshold to identify any low outliers?
Q1 - 1.5IQR