6) Statistics Flashcards

Average (arithmetic mean), median, range, mode, standard deviation

1
Q

What do you need to be able to calculate average (arithmetic mean) for a set of numbers?

A

NON EVENLY SPACED SETS:
> (1) Total fixed # of terms in the set (N)
> (2) SUM of the terms in the set
(don’t need individual terms)

EVENLY SPACED SETS:
> First term and last term
(short cut because you can deduce what the SUM and # of terms are based on the end points if you know the increment; Even if you don’t know the increment you just need to know the endpoints)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Maximization / minimization average problems

A

To MAXIMIZE the possible value of one variable, we need to MINIMIZE the other variables

Similarly, to MINIMIZE the possible value of one variable, we need to MAXIMIZE the other variables

You must maximize or minimize in the context of CONSTRAINTS set by the question (e.g., “if each remaining child own at least 4 goldfish … minimum number of goldfish becomes 4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Evenly spaced sets: What are they?

How do you find the # of terms in an evenly spaced set?

A

Evenly spaced sets are sets in which EACH PAIR of consecutive numbers in the set has the SAME DIFFERENCE (+ or -)

of terms in an evenly spaced set (inclusive of both end points) = (Last term - First term)/(increment) + 1

> make sure “last” and “first” term are INCLUSIVE numbers of the set (and ADJUST the endpoints before using the formula)

e.g., if finding the number of multiples of 3 between 1 and 100 inclusive, we need to adjust the end points to find the HIGHEST # DIVISIBLE BY 3 and the LOWEST # DIVISIBLE BY 3 [3, 99]

Even if the end points in the set you are trying to find are NOT inclusive, you can always adjust them so that the above formula works

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Evenly spaced sets: How do you calculate the average (arithmetic mean)?

A

Average = (First term + Last term)/2 = Median

> just need to arrange the numbers in a way that creates an evenly spaced set

> for a set with an ODD number of terms, the average will be equal to a TERM

> for a set with an EVEN number of terms, the average will be the average of the two middle terms

Why does this formula work?
> because it’s the same thing as RE-ORDERING the terms in a way that equals the same number, multiple times

e.g., 5, 22, 39, 56, 73, 90, and 107
> common difference is 17
> 107 + 5 = 112
> 22 + 90 = 112
> 39 + 73 = 112
> middle term is 56
> average = 112*3 + 56 / 7
= 392/7 = 56 = median = (107+5)/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Evenly spaced sets: How do you calculate the sum of terms?

A

Shortcut using averages: Sum = average * N

For evenly spaced sets, we know short cuts for Average and N

Sum = (first + last)/2 * (last - first)/increment + 1

LAST RESORT: Add up individual terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you count the # of multiples of EITHER integer A or integer B in a set of consecutive integers?

e.g., determine the # of multiples of 3 OR 4 from 1 to 90, inclusive

A

= # of multiples of A + # of multiples of B - # multiples of LCM(A,B)

> need to REMOVE duplicated numbers to avoid double counting (aka multiples of A AND B = multiples of LCM of A and B)

LCM!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you count the # of multiples of EITHER integer A or integer B, BUT NOT BOTH of those integers, in a set of consecutive integers?

e.g., determine the # of integers from 10 to 100, inclusive that are multiples of 2 OR 3, but not of both?

A

> the tricky part of this type of Q is that we want to remove ALL instances of multiples of LCM(A,B), not just duplicates

= # of multiples of A + # of multiples of B - 2*(# of multiples of LCM(A,B))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Weighted average

A

WA = Sum of weighted terms / frequency

= [(data point 1 * frequency 1) + (data point 2 * frequency 2) …] / total frequency of data points

= [(data point 1 * % frequency) + (data point 2 * % frequency)…], where total frequency sums to 100%

> same formula as Simple Average, except terms are not necessarily equally occurring

Tip:
> create a table with “N” and “data point”
> the frequency in the DENOMINATOR does not have to be count (N) of terms … it is whatever we want to WEIGHT THE AVERAGE BY —-> pay attention to what the question is asking for (e.g., miles per gallon = total miles / total gallons)

e.g., weight the average by TIME, weight the average by DISTANCE travelled, weight the average of # of items

> if you are given two end points and the weights and total are all unknown, always re-express the total as its components

e.g., 0.2x + 0.15y = 0.18z and we know x+y = z … replace z with x+y

0.2x + 0.15y = 0.18(x + y) —> allows you to identify relationship/ratio between the two components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Boundaries of simple versus weighted average of two different data points

A

The BOUNDARIES for average must be set based on the data points provided
(cannot be outside these boundaries!)
e.g., if 15% of sophomores use laptops and 10% of freshman use laptops, then the % of the total group must be between 10 and 15

The SIMPLE AVERAGE of two different data points will be the MEDIAN (middle value)

** The WA of two different data points will be CLOSER to the data point with the GREATER # of observations or greater weighted percentage

Implication:
> Therefore, you KNOW which data point has the HIGHER FREQUENCY OF OBSERVATIONS (or greater weighted percentage)

e.g., Tickets to play cost $10 for children and $25 for adults. If the average revenue per ticket was $18.25, which is greater than the simple average of $17.50, then there must be MORE adult tickets sold than children tickets sold

Also note:
> when there are only two data points, the sum of the weighted percentages = 100%
> If you think about a teeter-totter with endpoints set by the “boundaries” A and B…

(1) the distance from A to the WA = % B
(2) the distance from WA to B = % A

(we KNOW the ratio of the QUANTITY of the two data points)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you calculate weighted average if the percentages (that represent frequency) do not add up to 100%?

e.g., In a department store, 12% of the customers spend exactly $10 each, 18% spend exactly $20 each, and the rest spend more than $20 each. What is the average amount spent per person for all those who spend $20 or less?

A

Divide the sum BY the percentages we DO have
> can think of have 100 items and then asked about a subset of them

e.g., (1210 + 1820)/(12 + 18)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Percentages and ratios

A

If you KNOW the percentage of two data points, you KNOW the ratio of the data points

Similarly, if you KNOW the ratio, you KNOW the percentage of the data points

e.g., Data point A has frequency of 20% while data point B has a frequency of 80%

Therefore, A = 20% = 1/5
B = 80% = 4/5

Ratio of A:B = 1:4
A/B = 1/4 = 20%/80%

You can ALSO re-express relationship as A = B/4 and replace A
> For WA calculations, you will notice that “B” cancels out in the numerator and denominator, leaving you with a numerical answer for weighted average!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Using ratios and fractions when solving weighted averages

A

Given the value of two data points AND the ratio or fraction of the quantity of two data points, we can calculate the weighted average of the two data points

Similarly, given the value of two data points AND the weighted average of the two data points, we can calculate the RATIO or fraction of the quantity of the two data points
(you just need one ratio to know the other ratio since there are only two data points)

Why this works?
> linear equation with one unknown

e.g., A = 1000, B = 2000, and ratio of quantity of A to B is 1/2 …

A/B = 1/2 —-> A = B/2

WA = (1000A + 2000B)/(A+B) —> replace A with B/2

= (1000B/2 + 2000B)/(B/2 + B) —-> B’s cancel out

= (500 + 2000)/(3/2)
= 5000/3
= 1666.67

** Key characteristic for this problem type:
> two data points only,
> known boundaries
> AND either WA or ratio of quantities

(but if you know WA AND ratio of quantities, and ONE boundary, then you can solve for the remaining boundary’s data point)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does median mean?

How do you calculate the POSITION of the median value?

A

Median is the value that is the MIDDLE of the ARRANGED set (numerical order from lowest to highest)

> 50% of the data points fall BELOW the median and 50% of the data points fall ABOVE the median (NOT including the median value itself)

To find the median, we need to find its POSITION in the set:

(A) Sets with an ODD number of terms (N): Round UP N/2 to the nearest whole numbers
> e.g., 7/2 = 3.5 –> median is located at position 4

Alternatively, median is at the (N+1)/2 position

(B) Sets with an EVEN number of terms (N): ADD 0.5 to N/2
e.g., 6/2 = 3 –> median is located at position 3.5 (average of values at position 3 and 4)

Alternatively, median averages the values at the N/2 and (N+2)/2 positions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the bare minimum you need to be able to calculate the Median of a set?

A

For sets with an ODD number of terms –> need to know the value of the MIDDLE term

For sets with an EVEN number of terms –> need to know the values of the PAIR OF MIDDLE terms

Therefore, you do NOT need to know all the values of a set (just the middle ones)!

For DS Qs, it is helpful to test different positions of unknown terms to see if that changes the value of the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Mode - can you have more than one? Can you have none?

e.g., what is the mode of the set {1,2,3,1,2,3}

A

You can have zero modes, one mode, or more than one mode

Mode = the number that appears most frequently in a data set (need to track FREQUENCY of appearance of each data point)

> if each number appears the SAME NUMBER OF TIMES (1x, 2x, 3x etc.), then the set has NO MODE

e.g., set {1,2,3,1,2,3} has each number occurring 2x, so there is NO MODE

Helpful tip is to create a table with frequency (N) and data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you find the RANGE of a set?

What does RANGE of a set tell you? What does RANGE NOT tell you?

A

Range = Highest value - Lowest value

Visually, it is the DISTANCE between the two numbers (the further apart the numbers are, the greater than range)
> to increase range, move the end points farther away from each other

Range tells us HOW DISPERSED (spread out) the numbers are > similar to standard deviation

What does RANGE NOT TELL US?
> Total # of terms in a set
> every number in the set

12
Q

What does standard deviation measure?

When does standard deviation equal 0?

A

Measures the SPREAD (dispersion) of values in a data set
> how FAR are the data points from the MEAN of the data set
> a helpful proxy for standard deviation is the DIFFERENCE each data point is from the MEAN

Higher standard deviation = most points are FAR from the mean

Lower standard deviation = most points are CLOSE to the mean

Standard deviation = 0 when all the data points equal the mean (and are the same value)
> no difference between any of the terms and the mean

Also note:
> Standard deviation has the same units as the mean
> Standard deviation is always positive or equal to 0 (non negative)

—> if the data points are ALL EQUAL –> standard deviation is 0
—> if the data points are NOT all equal –> standard deviation is > 0

13
Q

How do you find the range of values (high and low bound) based on the mean and on the # of standard deviations from the mean?

A

HIGH VALUE = mean + (# of standard deviations)*SD

LOW VALUE = mean - (# of standard deviations)*SD

Also means INCLUSIVE RANGES

14
Q

Set transformations and impact on:

> standard deviation
mean
range
median

A: Adding/subtracting the same value to/from all terms in a set

B: Multiplying/dividing data by the same factor

C: Adding a new term that is EQUAL TO THE MEAN

A

Rule 1: Adding/subtracting the SAME VALUE to/from ALL terms in a set (delta) DOES NOT change the standard deviation
> shifting terms up or down by equal amount, shifting mean up or down by equal amount, and distribution stays the same
> Mean changes by delta
> Median changes by delta
> range also does NOT change

Rule 2: Multiplying/dividing the data by the same factor CHANGES the standard deviation; standard deviation will ALSO CHANGE BY THAT SAME AMOUNT
> if you visualize a number line, multiplying data by the same factor > 1 will cause the data points to be further dispersed from the mean
> Mean changes by *k
> Median changes by *k
> Range CHANGES by *k (keep the sign)

e.g., {1, 2, 3} x 100 becomes {100, 200, 300} —> data points are further apart than before

Rule 3: Adding a NEW term that is equal to the mean will DECREASE the standard deviation (ASSUMES that the standard deviation > 0 so it has room to decrease)
> new data point lowers the overall dispersion of the data points

15
Q

Over the span of 3 lacrosse games, Sara scored x, y, and z goals, respectively. The standard deviation of the number of goals scored per game was n. Was the standard deviation of the number of goals scored over the next 3 games greater than n?

(1) Over the next 3 games Sara scored x+k, y+k, z+k goals respectively

(2) Sara scored two goals in each of the next 3 games

A

DS standard deviation
> UNDERSTAND THE SITUATION

> Sara played 3 games and her # of goals in each game is represented by the set {x goals, y goals, z goals”}
Standard deviation is n
She then plays 3 MORE GAMES and scores {a goals, b goals, c goals}
Assume the standard deviation is p

is p > n?

(1) We know based on the transformation of sets rule that Standard Deviation does NOT change –> p = n –> ans is No –> Sufficient

(2) {a, b, c} = {2, 2, 2} —> p = 0

Case1: If {x, y, z} = {1, 1, 1} –> n = 0 = p –> ans is No

Case 2: If {x, y, z} are different numbers –> n > 0 –> P ALSO IS NOT GREATER THAN n –> ans is No

Since ans is always No –> Sufficient

  • don’t fall for the trap (this is not asking whether n > p)
16
Q

If every term of a set changes by a constant difference, what happens to:

> the mean of the set
the median of the set
the range
the standard deviation of the set

A

Statistics - set transformations

If EVERY term of the set increases/decreases by delta (e.g., +3, -4)

Mean changes by delta
Median changes by delta
Range DOES NOT CHANGE
Standard deviation DOES NOT CHANGE

17
Q

If every term of a set changes by a constant factor, what happens to:

> the mean of the set
the median of the set
the range
the standard deviation of the set

A

Statistics - set transformations

If EVERY term of the set increases/decreases by a factor of k (e.g., *3, /4, *-2)

Mean changes by *k
Median changes by *k
Range CHANGES by *k (keep the sign)
Standard Deviation CHANGES by *k

** standard deviation is ALWAYS POSITIVE (so multiply std by | k |)

18
Q

Helpful tip to keep track of numbers in statistics questions

A

1) mean formula
2) standard deviation
3) number lines
4) range
5) median
6) then figure out the unknowns

19
Q

Comparing standard deviations of sets with equal number of data points (without actually having to calculate standard deviation)

A

** ONLY APPLIES TO SETS WITH EQUAL # of TERMS

> Involves computing the ABSOLUTE DIFFERENCE between the mean of a set and each data point

(1) Calculate the mean of each set

(2) For each set, find the absolute difference between the mean of that set and each data point in that set

(3) SUM the absolute differences in each set

(4) Standard deviation correlates well to the size of the absolute differences
(the set with the smallest sum has the smallest standard deviation while the set with the largest sum has the largest standard deviation)

Why this works?
> Take the standard deviation formula, Sqrt[ sum of squared differences)/n]
> since n is equal, then we can confidently just look at the sum of differences
> if n is not equal, then this short cut does not work!!
e.g., {1,2,3} and {1,3} have different standard deviations even though the sum of absolute differences is the same

ALSO tips:
> check range first (if range is smaller –> standard deviation is probably smaller)

20
Q

What are indicators that all the data points in a set are the SAME?

A

1) range of a set is zero
> means max = min

2) Standard deviation of a set = 0

3) Largest value = mean
(can be proven mathematically)

4) Smallest value = mean (can be proven mathematically)

e.g., For a set of 11, max value 40 and mean is 40.

mean = sum/n
40 = sum/11
Sum = 440
Unknown sum + 40 = 440
Unknown sum = 400
> there are 10 numbers now that have to be equal to or less than 40
> if all 10 numbers equaled 40, then sum = 400
> any other combo would result in a value less than 40 and is invalid

*When any of these scenarios is NOT true, then the standard deviation > 0

> range >0
Max value does not equal mean
Min value does not equal mean

21
Q

As a backup, what is the standard deviation formula?

A

Sqrt[ sum of squared differences)/n]

22
Q

Notation for inclusive vs exclusive ranges?

A

Inclusive use square brackets (to include end points)

Exclusive use rounded brackets (to exclude end points) e.g., (2,10) = [3, 9] if only considering integers

23
Q

How many integers from 250 to 450, inclusive, leave a remainder of 2 when divided by 9?

A

Evenly spaced sets
> can be expressed as 9k + 2 (go up by 9)
> you need to determine the START AND END POINTS of the set

[254, 443]

of integers = (last - first)/increment + 1
= (443 - 254)/9 + 1
= 21 + 1
= 22

ALTERNATIVELY:
> figure out # of multiples of 9 [250,450], then shift up by 2 and exclude numbers outside of that range
> ans 23 - 1 answer that moves outside the range = 22

24
Q

Tip when solving Weighted

A