Measures of Dispersion Flashcards
How do you calculate a range?
Highest value - Lowest value
How do you calculate an interquartile range?
Q3 - Q1
How do you calculate a semi-interquartile range?
1/2 x (Q3-Q1)
or
1/2 x IQR
What are the two issues with a range and an interquartile range?
They do not:
i) Take into account the whole set of values.
ii) Have decent mathematical properties
What do we use to overcome problems from the range and interquartile range?
We use Variance and Standard Deviation
How else can you represent standard deviation?
the square root of variance
How do you calculate variance?
§x2/n - (§x/n)2
or sum of x2 / number of observations minus mean squared
What is the, often forgotten, other way of representing variance?
Sxx/n
How is variance represented in units? Why is this is a reason for using standard deviation?
As it uses x2, we use inches2 and not inches.
We can’t do much with inches2, so we square root the variance to get standard deviation so we can use inches.
How do we calculate variance for a frequency distribution?
§fx2/n - mean2
What do we do in order to make finding variance and SD possible with grouped frequency?
Use the midpoints of each group as your x value
What is coding?
A way of simplifying statistical calculations.
Original data is coded to make a new set of values which are easier to work with.
What formula do we use for coding?
y = x-a/b
If you want to find mean y then use
mean y = meanx - a/b
If you have values of 332, 355, 306, 317 and 340, what coding can you use?
y = (x-300)/10
This would give you values of 3.2, 5.5, 0.6, 1.7, 4.0
What is the definition of an outlier?
An extreme value that lies outside the overall pattern of data.
Which values do we use to find whether a piece of data is an outlier?
Above Q3 + 1.5x(Q3-Q1)
Below Q1 - 1.5x(Q3-Q1)
or
+ or - 2 standard deviations from the mean
Why aren’t outliers always unreasonable values?
They may be outside the range but possible. Someone may be 112 years of age, for example which would be an outlier but still possible.
What are anomalies? Give an example.
Values that are clearly errors, which would make data misleading if you kept it in.
For example, someone of 360 years of age.
Anomalies are always outliers, but outliers aren’t always anomalies.
What is a possible result of removing an anomaly/outlier?
The correlation (r) value may become higher
What are histograms used for?
Grouped continuous distributions.
What is the definition of continuous data?
Data that can take any value in a given range
What is the definition of discrete data?
Data that can take only specific values in a given range.
How do you form a frequency polygon?
By joining the midpoints of the top of each bar in a histogram.
What is the formula for the area under each histogram bar?
Area = kxfrequency
OR
Area is proportional to frequency.
All bars are in proportion so twice the area makes twice the frequency.
How do we represent the height of each bar in a histogram?
By using frequency density.
How do we calculate frequency density?
Frequency density = frequency / class width
What are numerical observations?
Quantitative variables, e.g. height.
What are non-numerical observations?
Qualititative variables, e.g. colour.
What do we do if data is skewed?
If the data is skewed, ie contains extreme values, then use the median and IQR.
What two variables do we typically comment on when comparing box plots?
The median of both.
The size of the IQR of both.