3] Descriptive Statistics 2 Flashcards
What is a measure of variation
It is a way to describe the distribution or dispersion of data, showing how far data points are from one another
Why do we use the measure of variation
We use it because just investigating the average can oversimplify the data so we must consider the variability in scores around the mean
What is a model
It is a simple representation of a complex thing
What are the three common measures of variation
1: The range
2: The interquartile range
3: The standard deviation
What are the two concepts/measures of standard deviation
1: Sum of squares
2: The variance
What is the range
It is the most simple descriptive measure of variation for a numerical variable
It represents the difference between the smallest and largest value to measure the total spread of data
E.g: 19 - 5= 14
The range: Advantages and disadvantages
Advantages
1: Very simple measure
Disadvantages
1: Doesn’t take account of all scores
2: Can be oversimplified
3: Doesn’t take into account of how values are distributed
4: Greatly effected by outliers
What is the interquartile range
It finds the extremely high and low scores in a dataset so it’s common
To find the range you exclude the lowest 25% of scores and the highest 25% of scores and that excludes any outliers
It’s the distance between Q1 and Q3
The interquartile range: Advantages and disadvantages
Advantages
1: The range is derived from the middle 50% of a distribution
2: Less likely to be influenced by extreme scores
3: Provides a better and more stable measure if variability than the range
Disadvantages
1: Only regards the middle 50% of scores and disregards the rest
2: Crude measure of variability
What is standard deviation
Is a measure of how dispersed the data is in relation to the mean
How do we see how far each number (results in a class test) is from the mean
We take each answer away from the mean (range) of the results overall and then add each answer
E.g: 3, 5, 6, 7, 9 = 6 (range/mean)
3 - 6 = -3
5 - 6 = -1
6 - 6 = 0
7 - 6 = 1
9 - 6 = 3
= 0
What is the sum of squares
When we want to investigate the difference between each score on a test and the mean but keep getting the answer 0 we can square each answer
E.g: 3, 5, 6, 7, 9 = 6 (range/mean)
3 - 6 = -3 (x-3) = 9
5 - 6 = -1 (x-1) = 1
6 - 6 = 0 (x0) = 0
7 - 6 = 1 (x1) = 1
9 - 6 = 3 (x3) = 9
= 20
What is the average sum of squares
We divide the result of the sum of squares by the amount of numbers
E.g: SoS = 20 and N = 5
20 / 5 = 4
It shows that on average each data point is 4 square units away from the mean
This is called the variance
What is the formula for variance
E (x - [-] x) (squared) / N = Variance
x (each data point)
[-] x (the mean)
N (number of data points)
What is the problem with variance
Its expressed in unusual units compared to the mean, which doesn’t make much sense
E.g: Mean = 6 and Variance = 4 squared units
How do we fix the issue with Variance
We use the same formula for variance but instead square root the entire thing
The value that is left I called the standard deviation and will always be a non-negative value
E.g: (square root) 4 = 2
This represents the average distance that each score is from the mean