Midterm Flashcards

Question

box plots

Answer 1

the box indicates the middle 50%, the lower boundary of the box represents the first quartile (i.e. the point where 25% of the sample lies under) and the upper boundary of the box represents the third quartile (i.e. the point where 75% of the sample lies above). The line through the box indicates the median. The whiskers indicate 1.5xIQR. Outliers are often included.

Answer 2

theoretical, bell shaped, unimodal, symmetrical, mode/mean/median is equal

Answer 3

+/- 1 standard deviation captures 68.26% of the sample

Answer 4

+/- 2 standard deviations captures 95.44% of the sample

Answer 5

+/- 3 standard deviations captures 99.72% of the sample

Answer 6

z-score is a position along the normal curve, indicates the number of standard deviations it falls above or below the mean. i.e. z-score of 1 means that the data point is 1 standard deviation above the mean

Answer 7

population and parameter are analogous with sample and statistic. in other words, statistics are characteristics of the sample, and parameters are characteristics of the population

Answer 8

equal probability of selection method

Answer 9

theoretical concept that links the sample to the population. The sample distribution is normal in shape, and the mean is equal to the population standard deviation/sqrN. The sampling distribution represents the distribution of the point estimates based on samples of a fixed size from a certain population.

Answer 10

the more samples we have, the closer we get to the normal curve. The law of large numbers is a principle of probability according to which the frequencies of events with the same likelihood of occurrence even out, given enough trials or instances. So if you flip 10 coins, you may get 90% heads and 10% tails, but if you flip 100 coins, you're more likely to get closer to 50% heads and 50% tails. The proportion of heads after n flips will almost surely converge to 1/2 as n approaches infinity.

Answer 11

The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed. This will hold true regardless of whether the source population is normal or skewed, provided the sample size is sufficiently large (usually n > 30). the average of your sample means will be the population mean

Answer 12

standard deviation of the sampling distribution | e.g. plotting the means of 50 samples of 10 would give you a normal curve with a standard deviation

Answer 13

a single statistic used to infer info about the population e.g. taking the mean of the heights of a sample of students and inferring the mean of the heights of all students from the sample mean

Answer 14

- bias: if an estimator is unbiased if the mean of its sampling distribution is equal to the proportion of interest. - efficiency

Answer 15

how certain do you want to be? e.g. alpha = 0.05 means a confidence level of 95% every alpha has a z-score associated with it e.g. alpha = 0.05 has a z-score of 1.96

Answer 16

(1) set the alpha (2) find the z-score associated with that alpha (3) use formula for confidence intervals with sample means

Answer 17

the bigger the sample the smaller the width of the confidence interval because standard error is smaller.

Answer 18

increase the alpha, e.g. instead of wanting alpha = 0.05 CI 95%, set alpha to 0.01 CI 99%.

Answer 19

confidence interval widens as confidence level increases.

Answer 20

null hypothesis (H0) always says there is no significant difference. alternative hypothesis (HA) says there is a significant difference. We always assume that the null is true.

Answer 21

- make a hypothesis - use z-score formula to determine probability of getting the observed difference: "this difference is statistically different at the alpha = 005 level." - trying to identify statistically significant differences that didn't occur by chance

Answer 22

(1) make assumptions -level of measurement is interval ratio, sampling distribution is normal (basically n > 120) (2) state null hypothesis (3) select sampling distribution and establish a critical region (4) compare the test statistic (5) make decision and interpret the results, either rejecting the null or failing to reject the null

Answer 23

one-tailed = "significantly less/more" +1.96 or -1.96 two-tailed = "significantly different" +/- 1.96 one-tailed is stronger.

Answer 24

critical region > alpha = < critical region, critical region e.g. alpha = 0.05, critical region +/- 1.96, alpha = 0.10, critical region =/-1.65

Answer 25

rejecting true null hypothesis. aka alpha error. this happens when the thing occurred by random chance but you claimed that it was significantly different. you can avoid type I error by increasing the alpha, e.g. saying you want to be 99% sure instead of 95% sure that something is significantly statistically different.

Answer 26

failing to reject false null hypothesis. aka beta error. this happens when the thing was actually significantly different but you claimed that was not statistically different and happened by random chance. you can avoid type II error by decreasing the alpha, e.r. saying you want to be 95% sure instead of 99% sure.

Answer 27

used for smaller samples (n < 120) when the population mean is unknown. the student t distribution is shorter and wider than the z-distribution.

Answer 28

(1) make assumptions - the samples must be independent random sample i.e. mutually exclusive; interval ratio measurements; sampling distribution is normal (basically n > 120) (2) State the null hypothesis (3) select sampling distribution and establish critical region (4) compare test statistic (5) make decision and interpret results

Answer 29

(1) make assumptions - the samples must be independent random sample i.e. mutually exclusive; interval ratio measurements; population variances are equal (as long as the 2 samples are approximately the same size, we can make this assumption), sampling distribution is normal (because we're using small samples, we have to add the previous assumption in order to make this one) (2) State the null hypothesis (3) select sampling distribution and establish critical region (4) compare test statistic (5) make decision and interpret results

Answer 30

(1) make assumptions - the samples must be independent random sample i.e. mutually exclusive; nominal measurements; sampling distribution is normal (basically n > 120) (2) State the null hypothesis (3) select sampling distribution and establish critical region (4) compare test statistic (5) make decision and interpret results

Answer 31

differences that are otherwise trivial or uninteresting may be significant. Significance just states whether something is different (is the difference in our sample correct/same as the population?), but it doesn't say if it is an important difference. The substantive importance is up for interpretation

Answer 32

test statistics (like p-vlue) get larger as n get larger.

Answer 33

when you're using the two-sample test, you're taking both estimates of the means and both standard deviations into account. So there is still a possibility of the error bars overlapping but the difference still being statistically different.

Answer 34

sample values

Answer 35

the use of sample data to calculate a single value (known as a statistic) which is to serve as a "best guess" or "best estimate" of an unknown (fixed or random) population parameter

Answer 36

means and proportions

Answer 37

Basically sample size.

Answer 38

larger, lower

Answer 39

point estimate: we estimate the population value is the same as the sample statistic interval estimate: we construct a confidence interval, a range of values into which we estimate the population value

Midterm Flashcards

(67 cards)