Week 3 Flashcards

1
Q

Mean

1/ Sample mean
2/ Population mean

A

Suppose you live in a small street with 9 families. The annual income of these families is reported below (in 000) in the order the data was collected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Outliers

The mean is seriously affected by extreme values called ‘outliers’.

A

Example: As soon as a very high income earner into the
street, average income poorly represents the income of the household in the street.

Suppose: Income of the new family $200 (000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Median

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mode

The mode of a set of observations is the value that
occurs most frequently.

A

MODE: Newspaper readership

Mode = Most frequently read newspaper N2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Mode 2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mode 3 - Histogram

A

Histogram of the Neighbourhood income example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mean, Median, Mode
Which is the best?

Rule of Thumb (Numerical Data):

  • Generally use the mean.
  • But use the median for variables with skewed distributions and/or if there are outliers in the data (Recall median is not as sensitive to extreme values as is the mean).
  • The mode is seldom the best measure of central location.
A

Ordinal/Nominal Data

  • The calculation of the mean is not valid.
  • Median is appropriate for ordinal data (which is ranked)
  • For nominal data, a mode calculation is useful for determining highest frequency.

* Central location is not meaningful for nominal data as the data doesn’t have an order (categories can be put in any order which will shift the mode).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Summary: Which measure to use?

A

1/ Compute the mean to: Describe the central location of a single set of numerical (or interval) data.

2/ Compute the median to: Describe the central location of a single set of numerical or ordinal (ranked) data.

3/ Compute the mode to: Describe a single set of nominal (or categorical) data but it is not a measure a central location in this case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Relationship between Mean, Median and Mode

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Measures of Variability

Problem: Measures of central locations (Mean, median, mode). do not tell us anything the spread of the observation (distribution) around the mean.

i.e. how much are the observations spread out around the mean value?

Mean =50

The mean of the two classes is the same (=50), but, class 1 has greate variability than class 2.

A

Solution: Measures of variability
• Range
• Variance
• Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The limit of Mean 2

Example: Consider two data sets (Same mean but different spread)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Range

A

The range is the simplest measure of variability, calculated as:
Range = Largest observation – Smallest observation

Example:

  • Data: {4, 4, 4, 4, 4, 50} Range = 50 - 4= 46
  • Data: {4, 8, 15, 24, 39, 50} Range = 50 – 4= 46
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variance

Variance: Most important statistics used to measure variability.

A

Variance: the average of squared deviation of all observations from the mean.

Consider two small populations (N=5):

Both populations have the same mean, but the B is spread more around its means than A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard deviation:

Standard deviation: Play a vital role in almost all statistical inference procedures.

A

The standard deviation of a set of measurements is the square root of the variance of the measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When should we use Coefficient of Variation?

1

A

When should we use Coefficient of Variation?

2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interpreting Standard Deviation

A
17
Q

Empirical Rule

A

If a histogram is bell shaped, the rule says that:

1) Approximately 68% of all observations fall within one
standard deviation
of the mean.

2) Approximately 95% of all observations fall within two
standard deviations
of the mean.

3) Approximately 99.7% of all observations fall within three standard deviations of the mean.

18
Q

Example: Empirical rule (Risky investment)

A statistician wants to describe the way returns on investment are distributed.
- The mean return = 10%
- The standard deviation of the return = 3%
- The histogram is bell-shaped.
How can the statistician use the mean and the standard
deviation
to describe the distribution?

A

  • Approximately 68% of the returns lie between 7% and 13% (within one standard deviation of the mean)
  • Approximately 95% of the returns lie between 4% and 16%. (two)
  • Approximately 99.7% of the returns lie between 1% and 19%. (three)
19
Q

Chebyshev’s theorem

A

Apply Chebyshev’s theorem when the distribution is not bell-shaped.

Chebyshev’s theorem:

For any data distribution, given that k is greater that 1, the proportion of observations that lie within standard deviations of the mean is at least 1–1/k2.

e.g. k=2, 1–1/22=0.75 or 75%,

k = 3, 1–1/32=0.89 or 89%

20
Q

Chebyshev’s theorem

The average score on a math test: mean = 84

A standard deviation of 4 points.

A