DECK 3: UNIT 1 part B (descriptive stats) Flashcards

1
Q

When can you round?

A

AT THE VERY END!!! (keep at least 3 digits until end!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a standard deviation?

A

average (typical) distance to the mean (about). It is how far you expect a random value to be away from the middle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Z score?

A

The number of standard deviaiton away from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For information purposes, which gives LEAST… stem-leaf, histogram or box-whisker?

A

Box/Whisker, BE CAREFUL. you really don’t know how things are distributed. The box and whisker and fish tank give a very GENERAL look.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the mode?

A

the peaks of a histogram (the humps). or with categorical data, the most popular category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the percentiles for Q1, med, and Q3?

A

25, 50 and 75

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do students often mix up IQR and St. Dev

A

They INCORRECTLY think that Q1 is 1sd below the mean and Q3 is 1sd above the mean. THIS IS NOT TRUE!!! Q1 is only .67 sd above the mean and Q2 is .67 below

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Does the IQR capture 68% of the data?

A

NO. it catches the middle 50%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What percentile is the median (aka Q2)?

A

50th

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What percent of the data is between Q1 and Q3?

A

50%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If the mean is above the median, the distribution may be

A

skewed right… the mean follows the tail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Another name for “skewed right” is

A

positively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How many SD wide is the IQR in a normal distribution?

A

NOT 2!!!! Think about it. The middle 68% is 2 sd wide, since the IQR is only the middlest 50% it must be less than 2. try [invnorm(.75)] x2. You find that it is only 1.35 SD wide if the distribution is nearly normal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What symbols do we use for population mean and sample mean?

A

Mu for population mean, xbar for sample mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What symbols do we use for population standard deviation and sample standard deviation?

A

Sigma for population and s for sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What percent of the data is above Q3?

A

25%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What percent of the data is below the median?

A

50%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the difference between categorical VARIABLES and categorical DATA?

A

The Variable is the overall category. Like “EYE COLOR”. The data is the actual measurement from the subjects. Like “blue, brown, blue”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you find percentiles and make a boxplot from OGIVE?

A

Go across till you hit the curve and then STRAIGHT DOWN!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can numbers be CATEGORICAL?

A

sure. Zip codes, sports jersey numbers, telephone numbers, social security nunmbers, area codes… these are categorical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the emperical rule?

A

mean 68-95-99.7 yeah!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When drawing a normal model, what are the PERCENTILES from left to right?

A

2.5, 16, 50, 84, 97.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

are any populations actually normal?

A

no, nothing is normal, just normalish. The only normal thing is the model we use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

If the distribution is skewed (or outliers/not symmetric) what would you use for center and spread statistics?

A

Median (center) and IQR (spread)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

If the distribution is bimodal or multimodal, what would you use for center and spread statistics?

A

Talk about each mode (center) and maybe use the range or IQR. You could also say “one group seems to go from __ to __ and the other from about __ to __”

26
Q

What is the variance?

A

The average squared distance to the mean. Or the SD2 (It is the SD before you take the square root, so it is the stuff under the radical in the formula)

27
Q

mean/SD/median/IQR. How do I know which ones to use?

A

when unimodal and symmetric, mean and sd. If skewed or outliers? Median and IQR. If bimodal? Talk about the MODES

28
Q

What percentile is Q1?

A

25th

29
Q

How do you find the median from an OGIVE?

A

go halfway up the y axis, then shoot across to the curve, then straight down. It’s at the 50th percentile (halfway up)

30
Q

How do you find 5 number summary from OGIVE?

A

Split the y axis into quarters. Shoot out to the right from 0, .25, .50, .75 and 1.00 till you hit the line in the ogive, then go straignt down. Those numbers on the x axis below correspond to the 5 numbers.

31
Q

If the distribution is unimodal and symmetric, what would you use for center and spread statistics?

A

Mean (center) and Standard Deviation (spread)

32
Q

What symbols do we use for population proportion (%) and sample proportion (%)?

A

p for population and p-hat for sample

33
Q

Why are there different standard deviation formulas for population and sample? Arent they the same thing?

A

Both equations are actually doing the same thing. They both attempt to calculate the true population proportion. When you have all of the data from the population you just divide by n and get the actual SD. BUT If you only have a sample then you are using that to make a guess (inference) at what the population standard deviation is.. What happens is that samples tend to have less spread so their SD underestimates the population, BUT, when you divide by n-1 instead of n, It gives you a better estimate of what the population standard deviation is.

34
Q

Give a simple example showing that adding a constant doesn’t change the spread, but changes the center. (this always happens)

A

Data set: 1,2,3,4,5 Spread (range):4, Center: 3
add three and get new data set: 3,4,5,6,7 spread:4 Center: 5 (center went up, spread stayed the same). The IQR and SD will stay the same, but median and mean go up 3. Called shifting, or sliding the data.

35
Q

what happens if you multiply all of a data set by a constant? Think of an example

A

it is scaled Both center and spread are impacted. Mean/ median/ stand dev/ iqr/ quartiles all multiplied by that constant. Center, spread and all individual values are changed. Consider 1,2,3,4,5 mean of 3 and range of 4. Now multiply by 3: 3,6,9,12,15 and you get a mean of 9 and a range of 12… both multiplied by three.

36
Q

what happens if you ADD a constant to each value in a data set?

A

it is SHIFTED only. Does not impact spread. This effects all of the data values and measures of center (mean, med) and quartiles, deciles, etc, IT DOES NOT CHANGE THE SPREAD! (IQR, St Dev, Range all stay the SAME).

37
Q

what is a clear example of the medians resiliance and when you would use the median instead of the mean?

A

(change just the top value). Imagine if we asked eight people how much money they had in their wallet. We found they had {1, 2, 2, 5, 5, 8, 8, 9}. The mean of this set is 5, and the median is also 5. You might say “the average person in this group had 5 bucks.” But imagine the same group the next week, but one of them just got back from the casino and the dist was (1, 2, 2, 5, 5, 8, 8, 9000}, in this case, the median would still be 5, but the mean goes up to over 1000. Which number better describes the amount of money the average person in the group this time? 5 bucks or 1000 bucks? I think 5 is a better description of the average person in this group and the 9000 is simply an outlier.

38
Q

What does SHIFT and SCALE mean?

A

Shift is when you add or subtract, scale is when you multiply

39
Q

Think of the minimum value, the mean and the standard deviation, what is impacted by shifting (adding a constant)

A

adding a value shifts the entire histogram to the right, so the min and the mean will increase by that amount, BUT THE SD WILL NOT CHANGE.

40
Q

Think of the minimum value, the median and the IQR, which is impacted by shifting (adding a constant?)

A

adding a value shifts the entire histogram to the right, so the min and the median will increase by that amount, BUT THE IQR WILL NOT CHANGE.

41
Q

Think of the minimum value, the mean and the standard deviation, what is impacted by scaling (multiplying by a constant)

A

If you multiply a data set by a number, then the min, mean and the SD will multiply by that number.

42
Q

Think of the minimum value, the median and the IQR, which is impacted by scaling? (multiplying by a constant)

A

If you multiply a data set by a number, then the min, median and the IQR will multiply by that number.

43
Q

If a distribution is skewed right, what will be greater, the mean or median? WHY?

A

Mean. The mean moves further to the right to keep balance.

44
Q

How does multiplying by a constant impact the summary statistics of a data set? (or random variable)

A

It is SCALED. Both center and spread are effected. They all (mean, median, IQR, SD, range) get multiplied by three. (BE CAREFUL, remember the variance is the SD squared, so the variance gets multiplied by 9).

45
Q

How do you match OGIVES to histograms?

A

RECTANGLE DROP!!

46
Q

How are mean, median and mode positioned in a skewed left histogram?

A

goes in that order, mean median mode

47
Q

Which is more sensitive to outliers and skewed? Mean, median. Sd or IQR?

A

Mean and SD are most influenced by outliers. median and IQR are RESISTANT, RESILIENT, ROBUST!!

48
Q

what is the shortcut normcdf?

A

gives % from raw data, skips Z score. normcdf (low VALUE, high VALUE, mean, sd)

49
Q

what is the shortcut invnorm?

A

gives data value from percentile, skips Z score. Invnorm (percentile, mean, sd)

50
Q

Why do we plug 999 into normcdf?

A

It needs a z score, but we can’t plug in infinity. So we go down or up 999 standard deviations and that pretty much gets everything

51
Q

If you want to find percentile for a value, what do you put into normcdf (? ?)

A

find z score for value, and then normcdf (-999, Zright) like going from negative infinity up to the z score

52
Q

are there any normal samples?

A

no, nothing is normal, just normalish. The only normal thing is the model we use.

53
Q

If you want to calculate the probability (%) something falls between two values in a normal model, what do you do?

A

find z scores for both value, and then normcdf (Z LOW, Z HIGH )

54
Q

the output for normcdf(Zleft, Zright) is_______

A

the area under the normal curve between the given z scores

55
Q

If you want to find % below a value, what do put into normcdf (? ?)

A

find z score for value, and then normcdf (-999, Zright)

56
Q

What does normcdf do?

A

It gives you the area under the normal curve between any two z scores

57
Q

What does invnorm do?

A

It gives you the Z SCORE from a percentile

58
Q

What is the total area under the normal curve?

A

1 or 1.000

59
Q

Which calculator function gives you a z score?

A

invnorm(%ile)

60
Q

which calculator function gives you a percent?

A

normcdf(Z left, Z right)

61
Q

If you want to calculate % above a value, what do you put into normcdf(? ?)

A

find z score for value, and then normcdf (Z left, 999)