rest of UNIT ONE vocab Flashcards

1
Q

What is a contingency table?

A

shows distributions across 2 variables like gender and music pref. AKA 2-way table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can you tell if variables in a contingency table are independent?

A

If the distributions are the same across the variables.. Then it doesn’t DEPEND.. so INDEPENDENT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When drawing a graph or chart, what do you have to remember to do?

A

LABEL AXES, make a KEY(if needed ) AND GIVE IT A NAME!!! “Figure 1: Age and Food Preference”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

marginal distribution

A

overall distributions of a single variable in contingency table (out in margins)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

conditional distribution?

A

A distribution within the table, along only one row or one column? NOT IN THE MARGINS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Association and Independence: How are they related?

A

Variables are either independent or associated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give an example of independent variables

A

If 80% prefer cheese and only 20% prefer pepperoni IN EACH GRADE AT BHS, then they all have the same preference, so grade doesn’t matter. We say “school year and pizza choice are independent” (not dependent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give a quick example of associated variables

A

A higher percentage of boys play video games than girls so we say “gender and video game playing are associated” or “gender and video game playing are not independent”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Gender and Video Game playing are___________ because_______

A

associated (not independent) because a higher percentage of males play video games. (think.. It depends on gender)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Year in school (F,S,J,S) and Pizza Preference (pepperoni or cheese) are __________ because _______________

A

independent (not associated) because they all have the same preferences. it doesn’t depend on grade, 80% of each group likes cheese better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do you call things that are not independent?

A

associated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

mean/SD/median/IQR? How do I know which ones to use?

A

when unimodal and symmetric, mean and sd. skewed or outliers? Median and IQR. BIMODAL? Talk about the MODES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you describe distributions (histograms)?

A

Shape-Cener-Spread- and STRANGE (Outliers and gaps) some say GSOCS. where’s yo GSOCS?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If asked to compare distributions, what should you write about?

A

Compare Shapes, Centers, Spreads, and Stranges.. The GSOCS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does GSOCS stand for?

A

Gaps Shape Outliers Center Spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If a distribution is skewed right, what will be greater, the mean or median? WHY?

A

Mean. The mean moves further to the right to keep balance. (the mean chases the tail)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If a distribution is skewed left, what will be greater, the mean or median? WHY?

A

Median. The mean moves left to keep balance. (the mean chases the tail)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Give a simple example showing that adding a constant doesn’t change the spread, but changes the center. (this always happens)

A

Data set: 1,2,3,4,5 Spread(range): 5-1=4, Center: 3
add three and get new data set: 3,4,5,6,7 spread: still 4 Center: 5 (center went up, spread stayed the same). The IQR and SD will stay the same, but median and mean +3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Give a simple example showing that multiplying by a constant changes both the spread and the center. (this always happens)

A

Data set: 1,2,3,4,5 Spread(range): 5-1=4, Center: 3
mult by three and get new data set: 3,6,9,12,15 spread:12 Center:9 (both center and spread were multiplied by three) IQR and SD will be multiplied by 3 and all values including Q1, median, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you describe center?

A

Talk about the mean (balance), median (splits area in half), mode (peaks? if bimodal, talk about both modes) or simply say: “centered around ____”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you describe shape?

A

unimodal, bimodal, multimodal, uniform AND symmetric, skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Spread description?

A

range, IQR, stand dev, variance, or simply say: “ From here to about here”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

If the distribution is unimodal and symmetric, what would you use for center and spread statistics?

A

Mean (center) and Standard Deviation (spread)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

If the distribution is skewed (or outliers/not symmetric) what would you use for center and spread statistics?

A

Median (center) and IQR (spread)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

If the distribution is bimodal or multimodal, what would you use for center and spread statistics?

A

Talk about each mode (center) and maybe use the range or IQR. You could also say “one group is from __ to __ and the other from about __ to __”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what happens if you ADD a constant to each value in a data set?

A

it is SHIFTED only. This effects all of the data values and measures of center (mean, med) and quartiles, deciles, etc… IT DOES NOT CHANGE THE SPREAD! (IQR, St Dev, Range all stay the SAME).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what happens if you multiply all of a data set by a constant?

A

it is scaled.. Everything is effected. Mean/ median/ stand dev/ iqr/ quartiles all multiplied by that constant. Center, spread and all individual values are changed.

28
Q

What is the five number summary?

A

min- Q1 - Q2(median)- Q3 and max

29
Q

How do you find Q1 and Q3?

A

Q1 is the median of the bottom half (25th %ile) and Q3 is the median of the upper half (75th %ile)

30
Q

How can you match boxplots to histograms?

A

USE THE FISH TANK METHOD!

31
Q

How do you match OGIVES to histograms?

A

RECTANGLE DROP!!

32
Q

How do you find percentiles and make a boxplot from OGIVE?

A

Go across from the percentile till you hit the curve and then STRAIGHT DOWN! (Q1 is @ 25%, median @50% and Q3 at 75%)

33
Q

For information purposes, which gives most:stem-leaf, histogram or box-whisker?

A

Stem leaf gives the actual values and the shape, histogram just the shape, and box-whisker the least amt of information but they are great for comparing multiple distributions.

34
Q

For information purposes, which gives LEAST: stem-leaf, histogram or box-whisker?

A

Box/Whisker, BE CAREFUL! you really don’t know how things are distributed. The fish tank gives a very GENERAL look.

35
Q

What percent of the data is above Q3?

A

25%

36
Q

What percent of the data is below the median?

A

50%

37
Q

What percent of the data is between Q1 and Q3?

A

50%

38
Q

What is the IQR?

A

Interquartile range? a measure of spread.. Q3-Q1.. The distance from Q1 to Q3.

39
Q

What are the percentiles for Q1, med, and Q3?

A

25, 50 and 75

40
Q

What percentile is Q3?

A

75th

41
Q

What percentile is Q1?

A

25th

42
Q

What percentile is the median (aka Q2)?

A

50th

43
Q

How do you find the median fro man OGIVE?

A

go halfway up the y axis (to 50%) shoot across to the curve, then straight down. It’s at the 50th percentile (halfway up)

44
Q

where are the “outlier fences?”

A

1.5 IQR above Q3 and 1.5 IQR below Q1. Just a rule of thumb..

45
Q

When comparing boxplots, what do you compare?

A

Medians and IQRS, ALSO, you might want to compare medians to quartiles if you can. For instance, if one has a median above the others Q3, you might say, “Half of the first group scored over 128 while less than 25% of the second did”

46
Q

What is a standard deviation?

A

average distance to the mean

47
Q

What is the variance?

A

The average squared distance to the mean (It is the SD before you take the square root, so it is the stuff under the radical in the formula)

48
Q

what is a z score?

A

the number of standard deviations away from the mean

49
Q

what is the emperical rule?

A

mean, 68-95-99.7 yeahh!.

50
Q

what are the percentiles from left to R on normal model?

A

2.5-16-50-84-97.5

51
Q

what is the mean/median/mode helper diagram?

A

a skewed left distribution with mean/median/mode labeled in order from L to R

52
Q

When can you round?

A

AT THE VERY END!!! (keep 3 digits until end!)

53
Q

Which is more sensitive to outliers and skewed?

A

Mean and SD are more influenced by outliers. median and IQR are RESISTANT, RESILIENT, ROBUST!!

54
Q

How many SD wide is the IQR in a normal distribution?

A

NOT 2!!!! The middle 68% is 2 sd wide, since the IQR is only the middlest 50%, it is less than 2. try [invnorm(.75)] x2 (you get 1.35 sd wide, each quartile is .67 SD from mean)

55
Q

How do students often mix up IQR and St. Dev?

A

They INCORRECTLY think that Q1 has a Z score of -1 and is 1sd below the mean and Q3 is 1sd above the mean. THIS IS NOT TRUE!!! The Z score for Q1 is about -.674 (for a normalish distribution, that is)

56
Q

are any populations actually normal?

A

no, nothing is normal, just normalish. The only normal thing is the model we use.

57
Q

are there any normal samples?

A

no, nothing is normal, just normalish. The only normal thing is the model we use.

58
Q

the output for normcdf(Zleft, Zright) is_______

A

the area under the normal curve between the given z scores

59
Q

If you want to calculate % above a value, what do you put into normcdf(? ?)

A

find z score for value, and then normcdf (Z left, 999)

60
Q

If you want to calculate % between two values what do you do?

A

find z scores for both value, and then normcdf (Z LOW, Z HIGH )

61
Q

Which calculator function gives you a z score?

A

invnorm(%ile)… YOU MUST USE PERCENTILE (%to left)

62
Q

What does normcdf do?

A

It gives you the area under the normal curve between any two z scores

63
Q

which calculator function gives you a percent?

A

normcdf

64
Q

What is the total area under the normal curve?

A

one , or 1.000

65
Q

If you want to find % below a value, what do put into normcdf (? ?)

A

find z score for value, and then normcdf (-999, Zright)

66
Q

If you want to find percentile for a value, what do you put into normcdf (? ?)

A

find z score for value, and then normcdf (-999, Zright)

67
Q

Why do we plug 999 into normcdf?

A

It needs a z score, but we can’t plug in infinity. So we go down or up 999 standard deviations and that pretty much gets everything?.