Sections 11-14 Sum, Mean, Median, Mode, Variability and Interquartile Range Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Summation Σ and Order of Operation

A

Σ = The Greek letter sigma, stands for SUM in statistics and indicates that we need to sum a set of numbers represented by algebraic terms.

* The MEAN is the BALANCE POINT of the DISTRIBUTION – the point around which the deviations sum to zero.

Ex: Suppose that we are told that X represents a sequence of four numbers: 1, 2, 3, 5. If we are asked to find ΣX, then we need to find the sum of all of the X values, as follows: ΣX=1+2+3+5=11.

Math ORDER of OPERATION review:

Rule 1: When there are parentheses, perform the operations inside the parentheses first.

Rule 2: Unless parentheses indicate otherwise, multiply and divide before adding and subtracting.

Rule 3: If there are both parentheses () and brackets [] or { }, solve first within the parentheses, then within the brackets, then perform any remaining operations.

Rule 4: Unless parentheses indicate otherwise, SUMMATION is done before other addition or subtraction, but after all other operations.

Note that the following expressions all produce different results:

Given that X = 1, 2, 3, 5

A. ΣX+10 = 1+2+3+5 +10 = 21

B. Σ(X + 10) = (1 + 10) + (2 + 10) + (3 + 10) + (5 + 10) = 61

C. Σ(X)(10) = (1)(10) + (2)(10) + (3)(10) + (5)(10) = 110

D. ΣX/10 = (1+2+3+5)/10 = 1.1

E. ΣX2 = 12 + 22 + 32 + 52 = 39

F. (ΣX)2= (1 + 2 + 3 + 5)2 = 121

Rule 5: After parentheses and brackets, exponents should be followed, then multiplication and division.

Ex: Square Root of (ΣXY2+10) (See proper notation at the bottom)

Step 1: There are no parentheses (in the correct notation below), so we begin by squaring each of the Y-values.

22 =4, 32 =9, 42 = 16, 62 = 36

Step 2: Our next step is to multiply each set of the X- and squared Y-values (in order).

(1)(4) = 4, (2)(9) = 18, (3)(16) = 48, (5)(36) = 180

Step 3: Next, we follow the summation sign and sum the products from Step 2.

4 + 18 + 48 + 180 = 250

Step 4: The next step is to add 10 to our sum.

250 + 10 = 260

Step 5: Finally, we take the square root to obtain the final answer.

Square Root of (260) = _16.2_

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mean

A

The most popular AVERAGE is the MEAN.

* It is so popular that it is sometimes simply called the average. However, the term average is ambiguous because there are several types of averages used in statistics.

To get the MEAN, simply sum the scores and divide by the number of scores.

Ex: Scores: 5,6, 7, 10, 12, 15

Sum of scores: 55

Number of scores: 6

Computation of mean: 55/6 = 9.166 = 9.17

* In scientific work, the mean is usually rounded to two decimal places.

SYMBOLS for MEAN:

M = mean for an ENTIRE POPULATION

* Another symbol for the POPULATION mean is µ (pronounced ‘mu’)

m = mean for a SAMPLE drawn from that entire population

* Another symbol for SAMPLE mean is X-bar (the symbol is an upper-case X with a line over the top of it – something I couldn’t get to copy here)

* Although calculating the mean is simple, it is important to become familiar with the symbols in the formula because they will be used later in this book in other formulas. The formula for the mean is (below) where:

M = POPULATION MEAN

ΣX = SUM of all Scores

N = NUMBER of cases or participants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Measure of Central Tendency and Deviation from the Mean

A

* The MEAN is the BALANCE POINT of the DISTRIBUTION – the point around which the deviations sum to zero.

* That is why another term for the AVERAGE is “Measure of Central Tendency”.

Ex: (refer to Table 1. at the bottom): The sum of the scores is 60. Dividing this by 5 yields a mean of 12.00. Subtracting the mean from each score produces the DEVIATIONS from the mean. These deviations sum to zero. (Notice that the negatives cancel out the positives when summing.)

A DRAWBACK of the MEAN: The mean is drawn in the DIRECTION of EXTREME scores. This is a PROBLEM if there are either some extremely high scores that pull it up or some extremely low scores that pull it down (i.e., when a distribution is HIGHLY SKEWED).

Ex: Following are the contributions, in cents, that two groups of children made to a charity.

Group A: 1, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 8, 10, 10, 10, 11

Mean for Group A = 5.52

Group B: 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 9, 10, 10, 150, 200

Mean for Group B = 21.24

* Overall, the two distributions look quite similar. Yet the mean for Group B is much higher than the mean for Group A because of two students who gave extremely high contributions of 150 cents and 200 cents – these extreme and unrepresentative contributions in the distributions are referred to as OUTLIERS.

* If only the mean for Group B is reported, and not all of the individual contributions, it suggests that the average student in Group B gave about 21 cents when, in fact, none of the students made a contribution of about this amount. Thus, the mean is misleading in describing the average contribution in Group B.

* There are other AVERAGES that are appropriate to accurately represent distributions that the mean fails to represent. These will be presented in future sections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mean, Median, and Mode

A

The MEAN is the BALANCE POINT in a DISTRIBUTION (most frequently used average)

The MEDIAN (another type of average) is the value in a distribution that has 50% of the CASES ABOVE it and 50% of the CASES BELOW it. Thus, it is the middle point in a distribution.

* An advantage of the median is that it is INSENSITIVE to EXTREME scores.

The MODE is (another type of average) is defined as the MOST FREQUENTLY OCCURRING score.

* A DISadvantage of the mode is that there may be more than one mode for a given distribution.

These THREE AVERAGES TOGETHER (Mean, Median, Mode) give a MORE ACCURATE picture of the distribution than any of the three can on their own.

Ex 1. Scores, 3, 3, 4, 7, 10, 12, 15

Mean = (3+3+4+7+10+12+15) / 6 = 9

Median = middle number when scores are listed from least to most (or vice-versa) = 7 (NOTE: if there were an EVEN number of scores, the Median would be the AVERAGE of the middle two scores)

Mode = 3

GUIDELINES for CHOOSING an AVERAGE:

* Calculate all 3, but if forced to use just one for some reason, choose the MEAN, because more powerful statistical tests (described later in this book) can be applied to it than to the other averages. (However, the mean is appropriate only for approximately symmetrical distributions and is inappropriate for describing the average of a highly skewed distribution (i.e. A distribution with some extreme scores on one side of the distribution)

* Choose the MEDIAN when the mean is inappropriate. The exception to this is when describing nominal data (which is data that cannot be put in an ordered sequence – like political affiliation, ethnicity, and so on).

* Choose the MODE when an average is needed to describe NOMINAL DATA (which is data that cannot be put in an ordered sequence – like political affiliation, ethnicity, and so on). Note that when describing nominal data, it is often not necessary to use an average. For instance, if there are more Democrats than Republicans in a community, the best way to describe this is to give the percentage of people registered in each party. To simply state that the MODAL (which one occurs most) political affiliation is Democratic is much less informative than reporting percentages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Averages in Normal and Skewed Distributions

A

In a perfectly symmetrical distribution (such as the NORMAL DISTRIBUTION), the mean, median, and mode all have the SAME value.

In SKEWED DISTRIBUTIONS, their VALUES are DIFFERENT.

POSITIVE SKEW – the MEAN has the HIGHEST VALUE because it is pulled in the direction of the extremely high scores.

NEGATIVE SKEW – the MEAN has the LOWEST VALUE because it is pulled in the direction of the extremely low scores.

* Do NOT use the mean when a distribution is highly skewed. See Table 1 on the next page for a review of these concepts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Variability: The Range and the Interquartile Range

A

VARIABILITY measures how much the scores vary from each other.

* VARIABILITY is important because it tells you HOW WELL the MEAN APPROXIMATES the GROUP you’re looking at. Is the group really represented by the mean or not. the tighter the VARIABILITY, the more the mean represents the group.

The RANGE (a measure of variability), tells you how widespread the data is that you are studying:

RANGE = High Value - Low Value

Example 1. Scores 2, 5, 7, 7, 8, 8, 10, 12, 12, 15, 17, 20

The RANGE = 18 (20 - 2)

* A weakness of the range is that it is based on only two scores, which may not reflect the true variability of all the scores.

Example 2. Scores 2, 2, 2, 3, 4, 4, 5, 5, 5, 6, 7, 20

The RANGE = 18 (20 - 2)

* Example 2 also has a RANGE of 18, but if you look closely, you’ll see that the variability of Example 1 is much greater than that of Example 2. This is because Example 2. had a score that lies far outside the bulk of the scores. When this happens, the NUMBER FAR AWAY is called an OUTLIER and it can create an undue influence on the range.

* Thus, the range is usually inappropriate for describing a distribution with outliers. A better measure of variability is the interquartile range (IQR).

INTERQUARTILE RANGE is defined as the RANGE OF THE MIDDLE 50% of the cases.

To find the interquartile range:

Step 1. Put the scores in order from low to high. Then, determine how many scores constitute one-quarter of the scores.

Using Example 2. Scores 2, 2, 2, 3, 4, 4, 5, 5, 5, 6, 7, 20

* There are 12 scores, so one-quarter of them (12 + 4) equals 3.

Step 2. Count UP from the LOWEST score the number of scores you calculated (which was “3”).

* After counting three scores from the bottom, you end up between the 2 and the 3. (2, 2, 2, (here) 3, 4, 4, 5, 5, 5, 6, 7, 20)

* That point will be represented by the average of the two scores surrounding it, so the average of 2 and 3 = 2.5.

Step 3. Count DOWN from the HIGHEST score the number of scores you calculated (which was “3”).

* Then count three scores from the top, and you end up between the 5 and the 6. (2, 2, 2, (Q1) 3, 4, 4, 5, 5, 5, (here) 6, 7, 20)

* That point will be represented by the average of the two scores surrounding it, so the average of 5 and 6 = 5.5.

Step 4. Subtract the answer to Step 2 (2.5) from the answer to Step 3 (5.5) to get (5.5- 2.5 = 3.0).

* Thus, 3.0 is the value of the INTERQUARTILE RANGE for this set of scores.

* When you report 3.0 to your audience, they will know that the range of the middle 50% of the participants is only 3 points. (Note that the undue influence of the outlier of 20 has been overcome by using the interquartile range instead of the range).

INTERQUARTILE RANGE and the MEDIAN – The interquartile range may be thought of like a first cousin of the median. (Remember that to calculate the median, you count to the middle of the distribution.) Thus, when the median is reported as the average for a set of scores, it is customary to report the interquartile range as the measure of variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly