Exam 1 Flashcards

Question

When can you NOT do empirical probability assignment

Answer 1

when you cannot repeat events

Answer 2

for when you cannot repeat events. comes from mathetmical model, not from observations or repetitions. (Ex: American roulette, if you bet on red what is the prob of winning? [18/38])

Answer 3

P(A) = # of outcomes in A / # of possible outcomes

Answer 4

subjective sense based on personal experience and guesswork

Answer 5

based on a set of axioms (=a statement we assume to be true) on how probability works

Answer 6

For any event A, 0 <= P(A) <= 1

Answer 7

P(S)=1, the probability of all possible outcomes of a trial must be 1

Answer 8

-the set of outcomes that are not in the event A is called the complement of A, denoted AC P(A^C) = 1 - P(A)

Answer 9

P(A or B) = P(A) + P(B) - P(A and B)

Answer 10

events that have no outcomes in common (and thus cannot occur together). Also called 'mutually exclusive'

Answer 11

Either on powerpoint or in textbook

Answer 12

P (B I A ) = P (A and B) / P (A)

Answer 13

used for conditional probability, comes up often

Answer 14

Either in book or slideshow

Answer 15

P (A and B) = P (A) * P (B I A )

Answer 16

0, and definitely not equal to P(A)

Answer 17

the totals on edges that aren't the TOTAL

Answer 18

where probabilities of two things meet on the center part of the table

Answer 19

tree diagrams

Answer 20

would be the sooner event

Answer 21

*** check out textbook! ***

Answer 22

This means finding the probability of something working backwards through tree diagram; e.g. knowing something already happened, find the probability that they were this (binge drinker)

Answer 23

for two branches, a bit complicated probably best to set up tree diagram

Answer 24

Hard part is recognizing the regular probabilities and conditional probabilities from the problem. Basically you divide the probability of both (e.g. on tree diagram) by the probability of one (which might take some investigating)

Answer 25

Binge drinker or from textbook

Answer 26

Two types, discrete and continuous.

Answer 27

e.g. P(X=x)

Answer 28

can take one of a countable number of distinct outcomes (ex. number of credit hours)

Answer 29

can take any numeric value within a range of values (ex. cost of books this term)

Answer 30

(1) collection of all possible outcomes, (2) the probabilities that the values occur

Answer 31

E(X) of a discrete random variable can be found by summing the products of each possible value by the prob. that it occurs -- write this by hand, E x * P(x)

Answer 32

E( x - u)^2 * P(x) , write this by hand

Answer 33

sq root Var (X)

Answer 34

Expected value = E x * P(x) // variance = E (x-u)^2 * P(x) // standard deviation = sq rt Variance

Answer 35

Adding or subtracting a constant from the data shifts the mean but doesn't change the variance or standard deviation

Answer 36

Multiplies the mean by that constant and the variance by the square of that constant

Answer 37

- The Sum or the difference of two random variables is also a random variable - the mean of the sum of two random variables is the sum of the means - the mean of the difference of two random variables is the difference of the means

Answer 38

, the variance of their sum or difference is always the sum of the variances *** this is important

Answer 39

yes, however you have one additional term here known as covariance (in this course given definition but not asked to calculate)

Answer 40

A random variable assumes any of several different numeric values as a result of some random event. Random variables are denoted by a capital letter such as X.

Answer 41

assumes a value based on the outcome of a random event

Answer 42

*** in class worksheet #1 ***

Answer 43

*** q2 mail-rder ***

Answer 44

- there are two possible outcomes (success and failure) - the prob of success, p, is constant - the trials are independent

Answer 45

- tossing a coin - looking for defective products rolling off an assembly line - shooting free throws in a basketball game - and many more…

Answer 46

'what is a trial'

Answer 47

of Bernoulli trials until the first success

Answer 48

p (probability of success)

Answer 49

*** screenshot p433

Answer 50

multiplication rule, complement rule

Answer 51

Rule that allows us to proceed if we don't have independent trials in Bernoulli. Still OK to proceed as long as the sample is small than 10% of the population [ex universal blood donors]

Answer 52

*** screenshot at 4.36pm ***

Answer 53

No longer counts the trials but counts the number of successes within the fixed number of Bernoulli trials.

Answer 54

n, number of trials // p, probability of success

Answer 55

u = np // s.d. = sqrt (npq)

Answer 56

*** 4.41 screenshot ***

Answer 57

when n is very large and when p is small. has only one parameter, lambda

Answer 58

n> with p100 with p

Answer 59

**** screenshot 4.46 ***

Answer 60

steps: - Verify Bernoulli - Aware finite or not - With replacement or without - Check 10 percent condition - Then recognize this is actually a binomial model set up, but because p is small n is large would want to use Poisson in this case - Will be quite close to what you should be getting in reality under independence

Answer 61

do some practice probs from hw 2

Answer 62

Can also say continuous random variable, bell shaped, unimodal, symmetric

Answer 63

false -- there IS a normal model for every mean and standard deviation

Answer 64

because this mean and s.d. do not come from data, they are numbers (called parameters) that specify the model

Answer 65

statistics

Answer 66

the real world"

Answer 67

change to standard normal model, then go back if needed. Uses z value

Answer 68

68 fall within one 1 standard deviation, 95 fall within 2, 99.7 fall within three

Answer 69

tells us y is above the mean

Answer 70

tells us y is below the mean

Answer 71

how many s.d.'s away from the mean an observation is

Answer 72

Normal model applies

Answer 73

Normal model applies

Answer 74

a continuous random variable that can take on any value

Answer 75

Solution: Use continuity correction

Answer 76

spread over region a to b; we know the area under the prob density curve is 1, outside the curve is 0

Answer 77

only parameter of exponential distribution

Answer 78

exponential distribution

Answer 79

endpoints a b

Answer 80

ex: exponential is time between arrivals, other variable is individuals arrive [identify which variable is discrete, which is continuous, go from there]

Answer 81

One piece of information that you’re measuring for every observation that can be difft for every observation

Answer 82

*** see book chapter [SAT scores] *** also could watch a video on it

Answer 83

Use Z scores and subtract under this from under bigger this

Answer 84

(1) n is large, p is small and use binomial, (2) n is large, p is not necessarily small and use normal model [when use normal can ask Q which normal model]

Answer 85

*** middle of lecture notes***

Answer 86

A Binomial distribution can be approximated by an appropriate Normal distribution if we expect at least 10 successes and 10 failures (i.e. np>10 and nq>10) -- then standardize, get z score

Answer 87

You got it wrong, idk what you were doing lol ***

Answer 88

any sub interval of same length has equal probabilities

Answer 89

small values more favored, as values increase prob. of observing them exponentially decrease

Answer 90

- so in hw and in the exams; could have q like average # of customers arriving is equal to five per hour - then next q is what is the prob. that the next customer will arrive within ten minutes - information is related to Poisson random variable - in that case you would want to use the exponential distribution to answer the probability question

Answer 91

- both models are like complementary to each other | - one is modeling the #s ; other is modeling the time between those numbers

Answer 92

relate the value to the whole of the data you're looking at

Answer 93

a table whose first column displays each distinct outcome and second column displays that outcome's frequency

Answer 94

first column displays each distinct outcome and second column display that outcome's relative frequency [just shows relative as opposed to freq table]

Answer 95

How human eye automatically compares areas. Area of distribution should be in proportion to relative frequency values

Answer 96

displays the distribution of a categorical variable, showing the counts for each category next to Eachother for easy comparison [all must have same width and stay true to area principle] -- Gaps placed deliberately between

Answer 97

each category as a slice of a circle so that each slice has a size that is proportional to the whole in each category

Answer 98

columns refer to values of one categorical variable, rows refer to values of other categorical variable

Answer 99

each cell gives the count for a combination of values of the two variables. ex: there were 528 third-class ticket holders who died

Answer 100

represents marginal distribution of other variable

Answer 101

for one variable

Answer 102

help us decide if there is any ASSOCIATION between variables we're looking for, or none

Answer 103

Pause and watch a video on this, now

Answer 104

"condition on the other variable" [ex. conditional distribution of variable "class" conditioned on variable "survival" = alive]

Answer 105

another note: we avoid using the word correlation, if there is association say 'dependent'

Answer 106

association: for categorical variables, also for any type of association that might not be linear // correlation: quantity that measures only certain type of position, linear

Answer 107

Like pie chart but in a bar

Answer 108

not the same; in side-by-side blue bars represent one distribution, red bars represent another

Answer 109

- it is neither a variable nor a number - what it is, is a collection of percentages - these three percentages together are what make up a conditional distribution if this were asked in a test for example, you would first state the condition [“we are looking at the male group”] under this condition would state the three numbers [ u would circle all these three values]

Answer 110

- to see if that is the case, can look at percentages and see if they add up to 100% - if not—that’s your clue that you are not looking at three separate distributions here - notice blue bars together add up to 100; red bars together add up to 100—so you have actually two distributions here

Answer 111

what we divide our range of numerical data into for histograms

Answer 112

presentations can feature two; a gap means no occurrences in that range

Answer 113

vertical axis is relative frequency, the freq divided by total [will not be asked to draw but yes to interpret]

Answer 114

first column is leftmost digit, second shows remaining

Answer 115

histogram where all the bins have the same frequency or close; will be flat

Answer 116

- is the variable quantitative? | - is the answer to the survey question or result of the experiment a number whose units are known?

Answer 117

histograms, stem-and-leaf diagrams, dotplots

Answer 118

bar and pie charts

Answer 119

A mode is a hump or high-frequency bin. One mode= unimodal, two= bimodal, 3= multimodal.

Answer 120

can treat as normally distributed

Answer 121

to identify the symmetry of the data

Answer 122

the longer tail is on the right side of the mode

Answer 123

the longer tail is on the left side of the mode [also sometimes called negatively skewed but doesn't mean values= negative]

Answer 124

might tell us something interesting or exciting about the data. Should always mention any straggler or outliers that stand away from body of the distribution

Answer 125

data points that are further from remaining bulk of data set. Will later classify as mild outliers and far extreme outliers

Answer 126

Income of a CEO, temperature of a person with a high fever, elevation at Death Valley

Answer 127

the center of the data values; half of the data values are to the left of the median and have are to the right of the median [for symmetric right in the middle]

Answer 128

Odd number of numbers, (n+1)/2 ... if sample size is even, split data in half and take avg of two middle values

Answer 129

Divide the data in one hundred groups. The n'th percentile is the data value such that n percent of the data lies below that value

Answer 130

the difference between max and minimum values [it is sensitive to outliers]

Answer 131

25th percentile and 75th percentile. IQR = Q3 - Q1, the difference between upper quartile and lower

Answer 132

minimum, Q1, median, Q3, and maximum

Answer 133

not sensitive to outliers (like range is). Reasonable summary of spread. Shows where typical values are, except for the case of a bimodal distribution. Not great for a general audience since most do not know what it is.

Answer 134

standard deviation

Answer 135

on the IQR, from hw

Answer 136

a chart that displays the 5-point summary and the outliers. Shows the IQR (middle yellow box). Dashed lines are called fences, outside the fences lie the outliers. Above and below the box are whiskers that display the most extreme data values within the fences. Line inside the box shows the median

Answer 137

lower fence = Q1 - 1.5 * IQR. Any data outside whiskers we call outliers

Answer 138

Farther than 3 IQRs from the quartiles

Answer 139

include median in both upper half and lower half of data (?)

Answer 140

Who? Months, What? Percentage of flights cancelled at US airports, when? 1994-2013, where? US, how? bureau of transportation statistics data

Answer 141

Identify the variable: percent of flight cancellations at US airports (quantitative, units are percentages) // data will be summarized in a histogram, numerical summary, and boxplot

Answer 142

on IQR and stuff

Answer 143

Describe shape, center, and spread of distribution. Report on symmetry, number of modes, and any gaps or outliers. Should also mention any CONCERNS about the data

Answer 144

what most people think of as the average. Add up all the numbers and divide by number of them. The mean is the "balancing point" [where the histogram will balance perfectly like a metronome]

Answer 145

For symmetric distributions, the mean and the median are equal. Balancing point is at the center. The tail "pulls" the mean towards it more than it does to the median. The Mean is more sensitive to outliers than the median.

Answer 146

F [the mean is MORE sensitive to outliers, so for skewed data the median is a better measure of center]

Answer 147

You also want to report mean and standard deviation; if it's symmetric you just have to report those two though

Answer 148

How far the data is spread out from the mean. Difference from the mean is y - y-bar. To make it positive square it. Will mostly be used to find the standard deviation which is the square root of variance

Answer 149

small; if very spread apart, then SD turns out to be large

Answer 150

Statistics is about variation so spread is an important fundamental concept; measures of spread tell us what we DON'T know about the data.

Answer 151

Use histograms to compare two or three groups. Use boxplots to compare many groups.

Answer 152

it's appropriate to use a histogram, and boxplot respectively

Answer 153

Local or global [outliers], especially in a time series. Investigate if the outliers are errors or remarkable.

Answer 154

trends over time

Answer 155

Can transform skewed data distributions to symmetric ones, can help to compare spreads of different groups

Answer 156

when writing journal articles better to use bar charts than pie

Answer 157

Describe modality, symmetry, outliers

Answer 158

Median and IQR if not symmetric. Mean and standard deviation if symmetric. Unimodal symmetric data, IQR > s. If not check for errors, skewness, outliers.

Answer 159

For multiple modes, possibly split the data into groups. When there are outliers, report the mean and s.d. with an w/o the outliers. Note any gaps in the data set.

Answer 160

Plan: summarize the distribution of the car's fuel efficiency. // Variable: mpg for 100 fill ups, Quantitive // Mechanics: show a histogram (described as fairly symmetric, low outlier)

Answer 161

the mean and median are close. report the mean and standard deviation

Answer 162

Distribution is unimodal and symmetric. Mean is 22.4 mpg. Low outlier may be investigated, but limited effect on the mean. s= 2.45; from one filling to the next, fuel efficiency differs from the mean by an avg of about 2.45 mpg

Answer 163

(1) if have categorical data do not make a historgram, (2) if have categorical data do a bar chart but would not be appropriate to talk about shape because someone else can basically just rearrange differently, (3) choose appropriate bin width

Answer 164

(1) don't report too many decimal places, (2) don't round in the middle of a calculation, (3) for multiple modes think about separating groups (4) beware of outliers, the mean and standard deviation are sensitive to outliers

Answer 165

(1) Do a reality check, don't blindly trust calculator, (2) Sort before finding median and percentiles, (3) don't worry about small differences in quartile calculation, (4) don't compute numerical summaries for a categorical variable!! [mean of a SS# is meaningless]

Answer 166

Make a picture, make a picture, make a picture

Answer 167

Like in that ch 1 homework problem

Answer 168

If they are errors, remove them

Exam 1 Flashcards

(200 cards)