Exam 1 Flashcards

1
Q

A Geometric model

A

good for when we’re interested in # of Bernoulli trials until next success

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bernoulli trials

A

2 possible outcomes (success and failure), probably of success p is constant, the trials are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

examples of Bernoulli trials

A

tossing a coin, shooting hoops

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

a Binomial model

A

when we’re interested in the number of successes in a certain number of Bernoulli trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

a Normal model

A

to approximate a binomial when we expect at least 10 successes/failures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Poisson model

A

when n is large and p is small. good approximation if n>eq 20 peq 100 with p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

lambda

A

only parameter within Poisson model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

good model for number of occurrences over given period of time

A

Poisson (with the parameter the mean of the distribution lambda)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Exponential model

A

can model the time between two random events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

mean time between two events

A

1 / lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

when p is small for a large # of cases

A

Normal model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when checking the probability of this many successes in a row

A

Binomial for Bernoulli

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

when asked how many trials until this happens

A

Geometric for Bernoulli

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

3 tips for sketching good Normal curve

A

-(1) bell-shaped and symmetric around its mean, start at the middle and then sketch from left to right, (2) only draw for 3 standard deviations left to right, (3) changes from curving downward to back up is called inflection point and is one standard deviation away from mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

tells how many standard devs a value is from mean

A

z score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Let y represent value corresponding to

A

outlying value indicated by a certain z score (e.g. high IQ example)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Table of standard normal distribution

A

Use it when you’re given a z score and looking for cut off and stuff

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Finding a percentile

A

Use z-table to find value of how many are below that given percentile, then do additional solving for y if needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Formula for IQR

A

Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

example of random phenomenon

A

flipping a coin, two possible outcomes; one toss of coin will consist of a ‘trial’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

term for result of one ‘trial’

A

‘outcome’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

term for collection of all possible ‘outcomes’

A

‘sample space’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

definition of empirical probability

A

[a specific number, what is that called] – says that the long-run relative frequency of repeated independent events (with identical probs.) gets closer and closer to a single value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

formula for empirical probability

A

of times A occurs / # of trials = relative frequency of occurrence A in long run [ex. red light green light, after many days P(green?)?=.35))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

When can you NOT do empirical probability assignment

A

when you cannot repeat events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

definition of theoretical probability

A

for when you cannot repeat events. comes from mathetmical model, not from observations or repetitions. (Ex: American roulette, if you bet on red what is the prob of winning? [18/38])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

formula for theoretical probability

A

P(A) = # of outcomes in A / # of possible outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

personal probability

A

subjective sense based on personal experience and guesswork

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

definition of formal probability

A

based on a set of axioms (=a statement we assume to be true) on how probability works

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Rule 1

A

For any event A, 0 <= P(A) <= 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Rule 2

A

P(S)=1, the probability of all possible outcomes of a trial must be 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Rule 3: The complement rule

A

-the set of outcomes that are not in the event A is called the complement of A, denoted AC

P(A^C) = 1 - P(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Addition Rule

A

P(A or B) = P(A) + P(B) - P(A and B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Disjoint events

A

events that have no outcomes in common (and thus cannot occur together). Also called ‘mutually exclusive’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

If events are disjoint, then P(A and B) = ??

ONLY if two events are disjoint

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Pause and do practice exercise on being in a relationship and in sports

A

Either on powerpoint or in textbook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Conditional probability [of B given A]

A

P (B I A ) = P (A and B) / P (A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Contingency table

A

used for conditional probability, comes up often

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Pause and do practice exercise on being a girl and popular

A

Either in book or slideshow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Shortcut for finding probability of two independent events A and B

A

P (A and B) = P (A) * P (B I A )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

If not independent P(A I B) = ??

A

0, and definitely not equal to P(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

marginal probabilities on contingency table

A

the totals on edges that aren’t the TOTAL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

joint probability on contingency table

A

where probabilities of two things meet on the center part of the table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

best diagram for working with conditional probability (probs. with ‘givens’ in them)

A

tree diagrams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

conditioning event on a tree diagram

A

would be the sooner event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

intersectional probability

A

** check out textbook! **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Reversing the conditioning

A

This means finding the probability of something working backwards through tree diagram; e.g. knowing something already happened, find the probability that they were this (binge drinker)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Bayes rule

A

for two branches, a bit complicated probably best to set up tree diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Reverse the condition [ctd]

A

Hard part is recognizing the regular probabilities and conditional probabilities from the problem. Basically you divide the probability of both (e.g. on tree diagram) by the probability of one (which might take some investigating)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Pause and do a practice problem on reversing the conditioning

A

Binge drinker or from textbook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Random variable (denoted ‘X’)

A

Two types, discrete and continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

specific value (denoted ‘x’)

A

e.g. P(X=x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

discrete random variables

A

can take one of a countable number of distinct outcomes (ex. number of credit hours)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

continuous random variables

A

can take any numeric value within a range of values (ex. cost of books this term)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

A probability model for a random variable consists of what two things?

A

(1) collection of all possible outcomes, (2) the probabilities that the values occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Expected value

A

E(X) of a discrete random variable can be found by summing the products of each possible value by the prob. that it occurs – write this by hand, E x * P(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Variance, formula

A

E( x - u)^2 * P(x) , write this by hand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Standard deviation, formula

A

sq root Var (X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Which formulas apply only to discrete random variables?

A

Expected value = E x * P(x) // variance = E (x-u)^2 * P(x) // standard deviation = sq rt Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Shifting and rescaling random variables by a constant

A

Adding or subtracting a constant from the data shifts the mean but doesn’t change the variance or standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Rescaling random variables by a constant

A

Multiplies the mean by that constant and the variance by the square of that constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Combining random variables

A
  • The Sum or the difference of two random variables is also a random variable
  • the mean of the sum of two random variables is the sum of the means
  • the mean of the difference of two random variables is the difference of the means
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

-If the random variables are independent,

A

, the variance of their sum or difference is always the sum of the variances *** this is important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

Now, what if the two random variables are not independent?

-can you still compute the variance

A

yes, however you have one additional term here known as covariance
(in this course given definition but not asked to calculate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

if see new random variable

A

A random variable assumes any of several different numeric values as a result of some random event. Random variables are denoted by a capital letter such as X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Random variable

A

assumes a value based on the outcome of a random event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Pause and do screenshot at 1.46 pm

A

** in class worksheet #1 **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Pause and do screenshot 4/30

A

** q2 mail-rder **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

Bernoulli trials – what are 3 conditions?

A
  • there are two possible outcomes (success and failure)
  • the prob of success, p, is constant
  • the trials are independent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Examples of Bernoulli trials?

A
  • tossing a coin
  • looking for defective products rolling off an assembly line
  • shooting free throws in a basketball game
  • and many more…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

First thing you need to identify in a problem involving Bernoulli

A

‘what is a trial’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

Geometric model – tells us what in terms of Bernoulli trials?

A

of Bernoulli trials until the first success

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

What single parameter is specified in the Geometric model

A

p (probability of success)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

Pause and do example of Geometric model [biased coin]

A

*** screenshot p433

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Helpful rules for solving problems on Geometric model

A

multiplication rule, complement rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Independence – the 10% condition

A

Rule that allows us to proceed if we don’t have independent trials in Bernoulli. Still OK to proceed as long as the sample is small than 10% of the population [ex universal blood donors]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Pause do to blood donor practice problem

A

** screenshot at 4.36pm **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Binomial probability model

A

No longer counts the trials but counts the number of successes within the fixed number of Bernoulli trials.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

Two parameters that define the Binomial model

A

n, number of trials // p, probability of success

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

Formulas for mean and s.d. in Binomial model

A

u = np // s.d. = sqrt (npq)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

Pause and do example on blood donors

A

** 4.41 screenshot **

82
Q

Poisson model

A

when n is very large and when p is small. has only one parameter, lambda

83
Q

What do we mean by n large p small

A

n> with p100 with p

84
Q

Pause and do example about leukemia clusters

A

** screenshot 4.46 ***

85
Q

Steps for calculating using Poisson

A

steps:
- Verify Bernoulli
- Aware finite or not
- With replacement or without
- Check 10 percent condition
- Then recognize this is actually a binomial model set up, but because p is small n is large would want to use Poisson in this case
- Will be quite close to what you should be getting in reality under independence

86
Q

Often on the exams people forget about what with adding up Poisson problems

A

X=0

87
Q

Be sure to know 0 factorial

A

do some practice probs from hw 2

88
Q

The Normal model

A

Can also say continuous random variable, bell shaped, unimodal, symmetric

89
Q

t/f: there is definitely not quite a normal model for every mean and s.d.

A

false – there IS a normal model for every mean and standard deviation

90
Q

Why do we use Greek letters for mean and standard deviation

A

because this mean and s.d. do not come from data, they are numbers (called parameters) that specify the model

91
Q

Summaries of data, like the sample mean and s.d., are written with Latin letters. Such summaries are called,

A

statistics

92
Q

Parameters, “what connects a model to,

A

the real world”

93
Q

Pause and watch video on parameter versus statistic

A
94
Q

What does the procedure called ‘standardization’ refer to??

A

change to standard normal model, then go back if needed. Uses z value

95
Q

68-95-99.7 rule

A

68 fall within one 1 standard deviation, 95 fall within 2, 99.7 fall within three

96
Q

If z score is positive

A

tells us y is above the mean

97
Q

is z score is negative

A

tells us y is below the mean

98
Q

what does z score tell us

A

how many s.d.’s away from the mean an observation is

99
Q

The Normal probability plot is nearly straight, so

A

Normal model applies

100
Q

Histogram is unimodal and somewhat symmetric, so

A

Normal model applies

101
Q

Binomial is discrete, giving probabilities for specific counts, whereas Normal is

A

a continuous random variable that can take on any value

102
Q

How to use the Normal model to approximate P(X=10) when X has the binomial model?

A

Solution: Use continuity correction

103
Q

The Uniform distribution

A

spread over region a to b; we know the area under the prob density curve is 1, outside the curve is 0

104
Q

Value of straight line at top of Uniform Distribution curve is calculated by

A

1 / (b-a)

105
Q

Lambda

A

only parameter of exponential distribution

106
Q

When your values go from zero to infinity on the non-negative real line

A

exponential distribution

107
Q

Parameters of uniform model

A

endpoints a b

108
Q

Connection between exponential distribution and Poisson distribution

A

ex: exponential is time between arrivals, other variable is individuals arrive [identify which variable is discrete, which is continuous, go from there]

109
Q

What is a variable

A

One piece of information that you’re measuring for every observation that can be difft for every observation

110
Q

Pause and do some practice problems that involve using the Z table

A

** see book chapter [SAT scores] ** also could watch a video on it

111
Q

A Probability involving “in between”

A

Use Z scores and subtract under this from under bigger this

112
Q

Binomial approximations work in two cases, what are they?

A

(1) n is large, p is small and use binomial, (2) n is large, p is not necessarily small and use normal model [when use normal can ask Q which normal model]

113
Q

Pause and do practice problem with Tennessee red cross

A

* middle of lecture notes*

114
Q

The Success/Failure condition

A

A Binomial distribution can be approximated by an appropriate Normal distribution if we expect at least 10 successes and 10 failures (i.e. np>10 and nq>10) – then standardize, get z score

115
Q

Pause and redo 16.51

A

You got it wrong, idk what you were doing lol ***

116
Q

Do practice problem on continuity correction

A
117
Q

Uniform distribution and sub intervals

A

any sub interval of same length has equal probabilities

118
Q

Trend for expoenential

A

small values more favored, as values increase prob. of observing them exponentially decrease

119
Q

Difference between exponential and Poisson

A
  • so in hw and in the exams; could have q like average # of customers arriving is equal to five per hour
  • then next q is what is the prob. that the next customer will arrive within ten minutes
  • information is related to Poisson random variable
  • in that case you would want to use the exponential distribution to answer the probability question
120
Q

More on exponential and Poisson

A
  • both models are like complementary to each other

- one is modeling the #s ; other is modeling the time between those numbers

121
Q

Relative frequency

A

relate the value to the whole of the data you’re looking at

122
Q

Frequency table

A

a table whose first column displays each distinct outcome and second column displays that outcome’s frequency

123
Q

relative frequency table

A

first column displays each distinct outcome and second column display that outcome’s relative frequency [just shows relative as opposed to freq table]

124
Q

Area principle

A

How human eye automatically compares areas. Area of distribution should be in proportion to relative frequency values

125
Q

Bar chart

A

displays the distribution of a categorical variable, showing the counts for each category next to Eachother for easy comparison [all must have same width and stay true to area principle] – Gaps placed deliberately between

126
Q

Pie chart presents,

A

each category as a slice of a circle so that each slice has a size that is proportional to the whole in each category

127
Q

Contingency table

A

columns refer to values of one categorical variable, rows refer to values of other categorical variable

128
Q

Cell of a contingency table

A

each cell gives the count for a combination of values of the two variables. ex: there were 528 third-class ticket holders who died

129
Q

bottom row is totals and reps marginal distributions for that variable where as right column,

A

represents marginal distribution of other variable

130
Q

column percentage

A

for one variable

131
Q

row percentage

A

for other

132
Q

Conditional distributions

A

help us decide if there is any ASSOCIATION between variables we’re looking for, or none

133
Q

‘Exploring the relationship between two categorical variables’

A

Pause and watch a video on this, now

134
Q

Important point on Titanic: ‘survival’ has two categories

A

“condition on the other variable” [ex. conditional distribution of variable “class” conditioned on variable “survival” = alive]

135
Q

Association is not between ‘class and alive’ and ‘class and dead’ – association is between class and variable survival

A

another note: we avoid using the word correlation, if there is association say ‘dependent’

136
Q

Pause and watch video on conditional distributions

A

yeah

137
Q

Difference between correlation and association

A

association: for categorical variables, also for any type of association that might not be linear // correlation: quantity that measures only certain type of position, linear

138
Q

Segmented bar chart

A

Like pie chart but in a bar

139
Q

Be careful not to confuse side by side bar charts

A

not the same; in side-by-side blue bars represent one distribution, red bars represent another

140
Q

Is conditional distribution a variable or a number?

A
  • it is neither a variable nor a number
  • what it is, is a collection of percentages
  • these three percentages together are what make up a conditional distribution

if this were asked in a test for example, you would first state the condition [“we are looking at the male group”] under this condition would state the three numbers [ u would circle all these three values]

141
Q

How to check if it’s a side by side or Not

A
  • to see if that is the case, can look at percentages and see if they add up to 100%
  • if not—that’s your clue that you are not looking at three separate distributions here
  • notice blue bars together add up to 100; red bars together add up to 100—so you have actually two distributions here
142
Q

Bins or classes

A

what we divide our range of numerical data into for histograms

143
Q

Choosing bin width

A

presentations can feature two; a gap means no occurrences in that range

144
Q

Relative frequency histogram

A

vertical axis is relative frequency, the freq divided by total [will not be asked to draw but yes to interpret]

145
Q

Stem and leaf displays

A

first column is leftmost digit, second shows remaining

146
Q

Uniform distribution

A

histogram where all the bins have the same frequency or close; will be flat

147
Q

Things to think about before you draw:

A
  • is the variable quantitative?

- is the answer to the survey question or result of the experiment a number whose units are known?

148
Q

Displays for quantitative data

A

histograms, stem-and-leaf diagrams, dotplots

149
Q

Displays for categorical data

A

bar and pie charts

150
Q

Understanding the spread of a graph. What is a mode?

A

A mode is a hump or high-frequency bin. One mode= unimodal, two= bimodal, 3= multimodal.

151
Q

If data is unimodal and symmetric,

A

can treat as normally distributed

152
Q

Once you have identified the mode, next step is,

A

to identify the symmetry of the data

153
Q

A histogram is skewed right if

A

the longer tail is on the right side of the mode

154
Q

A histogram is skewed left if

A

the longer tail is on the left side of the mode [also sometimes called negatively skewed but doesn’t mean values= negative]

155
Q

Identifying unusual features

A

might tell us something interesting or exciting about the data. Should always mention any straggler or outliers that stand away from body of the distribution

156
Q

Outliers

A

data points that are further from remaining bulk of data set. Will later classify as mild outliers and far extreme outliers

157
Q

Examples of outliers

A

Income of a CEO, temperature of a person with a high fever, elevation at Death Valley

158
Q

Median

A

the center of the data values; half of the data values are to the left of the median and have are to the right of the median [for symmetric right in the middle]

159
Q

How do we compute the median

A

Odd number of numbers, (n+1)/2 … if sample size is even, split data in half and take avg of two middle values

160
Q

Percentiles

A

Divide the data in one hundred groups. The n’th percentile is the data value such that n percent of the data lies below that value

161
Q

Range

A

the difference between max and minimum values [it is sensitive to outliers]

162
Q

The Interquartile Range

A

25th percentile and 75th percentile. IQR = Q3 - Q1, the difference between upper quartile and lower

163
Q

Middle fifty percent of the data, its spread

A

IQR

164
Q

5-number summary

A

minimum, Q1, median, Q3, and maximum

165
Q

Benefits and drawbacks of the IQR

A

not sensitive to outliers (like range is). Reasonable summary of spread. Shows where typical values are, except for the case of a bimodal distribution. Not great for a general audience since most do not know what it is.

166
Q

Anytime you use the IQR, you also in addition want to use

A

standard deviation

167
Q

Pause and do practice problem

A

on the IQR, from hw

168
Q

Boxplot

A

a chart that displays the 5-point summary and the outliers. Shows the IQR (middle yellow box). Dashed lines are called fences, outside the fences lie the outliers. Above and below the box are whiskers that display the most extreme data values within the fences. Line inside the box shows the median

169
Q

Finding the fences of a boxplot

A

lower fence = Q1 - 1.5 * IQR. Any data outside whiskers we call outliers

170
Q

Far outliers

A

Farther than 3 IQRs from the quartiles

171
Q

If you have odd number of data points,

A

include median in both upper half and lower half of data (?)

172
Q

Five W’s for flight cancellations

A

Who? Months, What? Percentage of flights cancelled at US airports, when? 1994-2013, where? US, how? bureau of transportation statistics data

173
Q

Analyzing data for flight cancellations, steps

A

Identify the variable: percent of flight cancellations at US airports (quantitative, units are percentages) // data will be summarized in a histogram, numerical summary, and boxplot

174
Q

Pause and watch extra vid

A

on IQR and stuff

175
Q

Analyzing flight cancellations [ctd]

A

Describe shape, center, and spread of distribution. Report on symmetry, number of modes, and any gaps or outliers. Should also mention any CONCERNS about the data

176
Q

Flight cancellations: The Mean

A

what most people think of as the average. Add up all the numbers and divide by number of them. The mean is the “balancing point” [where the histogram will balance perfectly like a metronome]

177
Q

Mean vs. Median

A

For symmetric distributions, the mean and the median are equal. Balancing point is at the center. The tail “pulls” the mean towards it more than it does to the median. The Mean is more sensitive to outliers than the median.

178
Q

T/f: The mean is less sensitive to outliers than the median

A

F [the mean is MORE sensitive to outliers, so for skewed data the median is a better measure of center]

179
Q

If you choose median as measure of center

A

You also want to report mean and standard deviation; if it’s symmetric you just have to report those two though

180
Q

What is the variance a measure of?

A

How far the data is spread out from the mean. Difference from the mean is y - y-bar. To make it positive square it. Will mostly be used to find the standard deviation which is the square root of variance

181
Q

If your data is closely packed around the mean, the SD turns out to be,

A

small; if very spread apart, then SD turns out to be large

182
Q

Thinking about variation

A

Statistics is about variation so spread is an important fundamental concept; measures of spread tell us what we DON’T know about the data.

183
Q

Choose the right tool

A

Use histograms to compare two or three groups. Use boxplots to compare many groups.

184
Q

Pause and find an example of when,

A

it’s appropriate to use a histogram, and boxplot respectively

185
Q

Treat outliers with attention and care

A

Local or global [outliers], especially in a time series. Investigate if the outliers are errors or remarkable.

186
Q

Use a timeplot to track,

A

trends over time

187
Q

Re-express or transofrm data for better understanding

A

Can transform skewed data distributions to symmetric ones, can help to compare spreads of different groups

188
Q

When to use segmented bar charts

A

when writing journal articles better to use bar charts than pie

189
Q

What to tell if encounter histogram, stem-and-leaf, boxplot

A

Describe modality, symmetry, outliers

190
Q

What to tell re: center and spread

A

Median and IQR if not symmetric. Mean and standard deviation if symmetric. Unimodal symmetric data, IQR > s. If not check for errors, skewness, outliers.

191
Q

What to tell re: unusual features

A

For multiple modes, possibly split the data into groups. When there are outliers, report the mean and s.d. with an w/o the outliers. Note any gaps in the data set.

192
Q

If asked to summarize data [specific example: fuel efficiency]

A

Plan: summarize the distribution of the car’s fuel efficiency. // Variable: mpg for 100 fill ups, Quantitive // Mechanics: show a histogram (described as fairly symmetric, low outlier)

193
Q

Which to report mean or median [specific example: fuel efficiency]

A

the mean and median are close. report the mean and standard deviation

194
Q

Conclusions in summary of data [specific example: fuel efficiency]

A

Distribution is unimodal and symmetric. Mean is 22.4 mpg. Low outlier may be investigated, but limited effect on the mean. s= 2.45; from one filling to the next, fuel efficiency differs from the mean by an avg of about 2.45 mpg

195
Q

What can go wrong in analyzing data

A

(1) if have categorical data do not make a historgram, (2) if have categorical data do a bar chart but would not be appropriate to talk about shape because someone else can basically just rearrange differently, (3) choose appropriate bin width

196
Q

What can go wrong in analyzing data [ctd, 2]

A

(1) don’t report too many decimal places, (2) don’t round in the middle of a calculation, (3) for multiple modes think about separating groups (4) beware of outliers, the mean and standard deviation are sensitive to outliers

197
Q

What can go wrong in analyzing data [ctd, 3]

A

(1) Do a reality check, don’t blindly trust calculator, (2) Sort before finding median and percentiles, (3) don’t worry about small differences in quartile calculation, (4) don’t compute numerical summaries for a categorical variable!! [mean of a SS# is meaningless]

198
Q

What can go wrong [ctd,4]

A

Make a picture, make a picture, make a picture

199
Q

Pause and watch a video on identifying the variables

A

Like in that ch 1 homework problem

200
Q

Pause and watch a video on bewaring of outliers

A

If they are errors, remove them