Lecture 3 Flashcards

Question 1

Q

Why is knowing the dispersion of data important?

Answer

A

Two different data sets may share the same mean, but the dispersion (spread) of that data may be entirely different

Question 2

Q

Measures of Dispersion

Answer

A

Range (IQR or Median), Variance and Standard Deviation

Question 3

Q

Range

Answer

A

Calculated by subtracting the smallest value from the largest value. Strongly affected by outliers

Question 4

Q

IQR x 1.5 rule

Answer

A

We call an observation a suspected outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile. Can be used to justify excluding outlier data.

Question 5

Q

Variance

Answer

A

The overall degree to which data points differ from the mean and each other

Question 6

Q

How to calculate variance?

Answer

A

Find the difference between each value and the mean.
Square these differences
Add these differences together
Divide by the number of data points minus 1 (if working with a sample – if working with the population simply divide by the number of data points)

Question 7

Q

Why do we calculate standard deviation rather than variance?

Answer

A

Variance: we square the differences between the individual data points and the mean, which means the units of measurement are less intuitive

Question 8

Q

Standard Deviation

Answer

A

A very useful measure of dispersion IF our data is normally distributed. Nearly all values/observations fall within 2 standard deviations either side of the mean.

Question 9

Q

The 68-95-99.7 Rule

Answer

A

68% of data will lie 1SD away from the mean. 95% of data will lie 2DS’s away from the mean. 99.7% of data will lie 3SD’s away from the mean