Statistics Flashcards

Question

We randomly draw two cars, without replacement, from a standard deck of cards. What is the probability that both cards are kings? (there are 4 kinds in standard deck of cards)

Answer 1

P(A,B) = P(A) * P(B|A) = (4/52)(3/51)

Answer 2

simple example flipping two fair coins. recall Proba =: # outcomes of interest / total# outcomes What is proba of getting 2 heads in a row? .5*.5 = .25 build a proba tree and see that: P(H,H) = .5*.5 = .25 ``` P(H,T) = .5*.5 = .25 P(T,H) = .5*.5 = .25 ``` P(T,T) = .5*.5 = .25 so P(one H, one T) = .25+.25 = .5 What is p-value of getting two heads in a row? First define p-value as the proba that random CHANCE/inherent random proba generated the data/outcome UNION a proba from outcome from something else that is EQUAL or RARER . thus, there are THREE PARTS to p-value: part 1: random chance/inherent proba-- equals P(H,H)=.25 here. part 2: ...part 1 UNION with outcome T,T which is an outcome EQUAL in proba as H,H since both outcomes have the SAME proba of occurring, i.e. P(T,T) = .P(H,H) = .25 part 3: ...part 1, part 2 UNION any other outcome(s) that are more rare (i.e. have inherent proba < P(H,H) ). p-value (H,H) = P(H,H) + P(any event equal # outcome) + P(possible outcome more extreme) = .25 + .25 + 0 = 0.50 More complicated example flipping coin 5 times and getting 5 H. ``` Proba = # outcome of interest/# total outcomes P(five H) = 1/32 = .03125 P(4H, 1T) = 5/32 P(3H,2T) = 10/32 = 5/8 P(2H,3T) = 10/32 = 5/8 P(1H,4T) = 5/32 P(five T) = 1/32 ``` p-value (five H) = P(5 H) + P(some event equal # outcomes as 5H) + P(something fewer # outcomes than 5H) = 1/32 + P(5 T) + 0 = 2/32 = 1/16 = .0625 Notice that p-value (5 H) = 0.0625 > alpha=0.05, so it is not all that unusual that to see 5 heads in row! What is p-value (4T, 1H)? p-value (4T, 1H) = P(4T,1H) + P(event with equal # outcomes) + P(event fewer # outcomes) = P(4T,1H) + P(1T, 4H) + P(5 H) + P(5 T) = 5/32 + 5/32 + 1/32 + 1/32 = 12/32 = 3/8 = 0.375

Answer 3

An A/B test is an experiment with two groups to establish which of two treatments, products, procedures, or the like is superior. Often one of the two treatments is the standard existing treatment, or no treatment (placebo). If a standard (or placebo) treatment is used, it is called the CONTROL. A typical HYPOTHESIS is that treatment is BETTER than control. A proper A/B test has SUBJECTS that can be assigned to one treatment or another. The key is that the subject is EXPOSED to the TREATMENT. Ideally, subjects are randomized to treatments. In this way, you know that any difference between treatment groups is due to one of two things: - the EFFECT of different treatments - LUCK of the draw such that subjects the random assignment may have resulted in the naturally better-performing subjects being concentrated in A or B. We must define a metric (test statistic) to compare group A to group B. Perhaps the most common metric in data science is a binary variable: click/no-click, buy/no-buy, fraud/no-fraud,...etc. Binary metric results may be summed u p in a 2x2 outcome table.

Answer 4

Why not skip the control group and just run an experiment applying the treatment of interest to only one group, and compare the outcome to prior experience? Without a control group, there is no assurance that "other things are equal" and that any difference is really due to the treatment (or chance). When we have a control group, it is subject to the same conditions (except for treatment of interest) as the treatment group. If we simply make a comparison to "baseline" or prior experience, other factors, besides treatment, might differ. Furthermore, in a standard A/B test,we need to decide on one metric ahead of time. Multiple behavior metrics might be collected and be of interest, but if the experiment is expected to lead to a decision between treatment A and treatment B, a SINGLE metric (test statistic) must be established BEFOREHAND, else we risk the potential for researcher BIAS.

Answer 5

A binomial experiment, B(n,p), consists of a fixed number of Bernoulli trials. The binomial dist is the FREQUENCY dist if the NUMBER OF SUCCESSES (x) in a given number of trials (n) with SPECIFIED proba (p) of SUCCESSES in each trial. There is a family of binomial distributions, depending on values of x, n, p. The bin dist would be a question like: "if the proba of a click converting to a sale is .02, what is the proba if observing 0 sales in 200 clicks?" Theorem: The proba of exactly k successes in a binomial experient B(n,p) is given by P(k) = P(k successes) = C(n,k) p^k q^(n-k) The proba of >=1 successes is 1-q^n where n choose k, C(n,k), is the "binomial coef" The mean of binom dist is n*p; you cab also think of this as the expected number of SUCCESSES for each trial is n*p. The variance of a binom dist is n*p(1-p) = n*p*q. With a large enough number of trials (particularly when p is close to .50), the binom dist is virtually indistinguishable from the normal dist! In fact, calculating a binom dist is computationally demanding, and most stat procedures use the normal dist, with mean and var, as an approximation.

Answer 6

A Bernoulli trial repeats the following: We have an experiment "e" with two outcomes, one called success (S) and the other called failure (F). Let p denote the proba of success and let q = (1-p) denote the proba of a failure. For the experiment "e", the trials are independent.

Answer 7

This is a binomial experiment. By the binomial proba, P(k) = P(k successes) = C(n,k) p^k q^(n-k) (a) the proba of k=3 successes is P(k=3) = C(7,3) (1/3)^3 (2/3)^4 = (560/2187) = 0.26 (b) the proba of one or more (k>=1) successes is the proba of NEVER hitting a target is P(k=0) = q^7 = (2/3)^7 = 128/2187 ~= .06 so, P(at least one hit) = 1-q^7 = (2187/2187 - 128/2187) = (2059/2187) = .94

Answer 8

A r.v. X is normal if it's density function f(x) has a bell-shaped curve and is of the form f(x) = 1/sqrt(2*pi*sigma) exp[-.5 (x-mu /sigma)^2] The normal dist depends on params mu,sigma and is denoted as N(mu, sigma^2)

Answer 9

Suppose X is any normal dist N(mu, sigma^2). The STANDARDIZED r.v. corresponding to X is defined by Z = X-mu /sigma Z is also normal dist'd and that mu=0 and sigma=1, s.t. Z ~ N(0,1) The density function for Z, obtained by setting z = (x-mu) /sigma in the density for N(mu, sigma^2) is: phi(z) = 1/sqrt(2pi exp(-z^2 /2) where phi(z) is the area under the std norm density curve. The percentages under the std. norm density curve give rise to the 68-95-99.7 rule: 68. 2% for -1<= z <= 1 95. 4% for -2<= z <= 2 99. 7% for -3<= z <= 3 This rule says that, in a norm dist'd population, 68% of the pop. falls within 1 sd of the mean, 95% falls within 2 sd of the mean, and 99.7% falls within 3 sd of the mean.

Answer 10

Recall is the proportion of true 1s (y=1) correctly classified. synonym: Sensitivity Recall = TP / (actual y=1) Recall is horizontal first row in confusion matrix

Answer 11

Precision is the proportion of predicted 1s (yhat=1) that are actually 1s.

Answer 12

Receiver Operating CharacteristicsROC is a plot of recall (sensitivity) on the y-axis vs. specificity (False positive rate) on the x-axis. The ROC curve shows the tradeoff between recall and specificity as one changes the decision threshold for the positive class.

Answer 13

Specificity is the proportion of true 0s (y=0) correctly classified

Answer 14

A confusion matrix is a tabular display of the record counts by their predicted and actual classification status. The PREDICTED outcomes are the columns. The TRUE outcomes are the rows. The diagonal elements of the matrix show the CORRECT predictions. The off-diagonal elements show the number of INCORRECT predictions.

Answer 15

Standard deviation corresponds to how wide the normal pdf curve is around the mean (center of curve). i.e. the std tells us how the data are spread around the mean.

Answer 16

Consider the normal distribution pdf curve as a (continuous) approximation of a (discrete) histogram. e.g. say center of curve (mean) is 20. If we want to know the rhs area under the normal curve with x>=30, then obtain the area under the curve for all values x>=30, and divide by total area, 1, under the normal pdf curve. The rhs upper portion of x>=30 is a fraction of the area under the curve. The proportional area where x>=30 divided by the total area is then a probability of observing observations with x>=30 occurring!

Answer 17

Since we rarely have enough time/money to measure everything in an entire population, we almost always measure the population PARAMETERS (i.e. population mu, sigma) using a relatively SMALL sample. We might poll 5 people among NYC 8mm population to ESTIMATE the POPULATION PARAMETERS. The reason why we want to know the POPULATION PARAMETERS is to ensure that the results drawn from our experiment are REPRODUCIBLE. i.e. if someone else takes a separate sample of 5 people among NYC 8mm, then they will obtain 5 DIFFERENT sample mu, sigma estimates from the SAME population. Every time we do a new sample, we get different values for the parameters mu, sigma. note: the fewer/greater number of obs in the samples results in worse/better estimates of the population parameters. This means the more data we have, the more CONFIDENCE we can have in the accuracy of the estimates. One of the main goals of statisticians is to quantify how much CONFIDENCE we have can have in population ESTIMATES. Confidence intervals and p-values are used to quantify the confidence in estimated params. These metrics tell us that while the pop estimates are different for each sample, they may not be SIGNIFICANTLY different from each other--so that we should be able to REPLICATE the results BETWEEN SAMPLES! From a machine learning perspective, image the 5 sample obs are the TRAINING set, and the normal pdf curve that represents the population is what we want to PREDICT, and generalize well, with our ML method.

Answer 18

actual POP var is: sigma^2 = SUM(X - mu) /n where mu is the POP mean and X is a vector of xi observations. The result of pop var, unfortunately, is in squared units, so that its value cannot be directly related to the norm dist curve. We can fix this by just taking the sqrt to get the pop std, which can be plotted on the norm curve plot. Since we usually never have access to all of the pop data, an ESTIMATE (calculated) of the pop variance is: Sum(X - xbar)^2 / n-1 ...dividing by n-1 compensates for the fact that we are calculating diffs from the sample mean INSTEAD for the pop mean., otherwise, we would consistently UNDERESTIMATE the var around the pop mean. This is because the diffs between data and the sample mean tend to be smaller than the diffs between the data and the pop mean. i.e. Sum(X-xbar)^2 /n-1 < Sum(X-mu)^2 /n Thus, the diffs around the pop mean will result in a larger average, and the larger average is what we are trying to estimate.

Answer 19

Hypothesis test (significance tests) are ubiquitous in traditional stats analysis of published research. Their purpose is to help us learn WHETHER RANDOM CHANCE MIGHT BE RESPONSIBLE FOR AN OBSERVED EFFECT. An A/B test is typically constructed with a hypothesis in mind. e.g., the hypothesis might be that price B produces higher profit. Why do we need a hypothesis? Why not just look at the outcome of the experiment and go with whichever does better? The answer lies in the tendency of the human mind to UNDERESTIMATE THE SCOPE OF NATURAL RANDOM BEHAVIOR. One manifestation of this is the failure to anticipate extreme events (black swans). Another manifestation is the tendency to MISINTERPRET RANDOM EVENTS AS HAVING PATTERNS OF SOME SIGNIFICANCE. Statistical hypothesis testing was invented as a way to PROTECT RESEARCHERS FROM BEING FOOLED BY RANDOM CHANCE. In a properly designated A/B test, you collect data on treatments A and B in such a way that any observed difference between A and B must be due to either: - random chance in assignment of subjects - a true difference between A and B. A stat hypothesis test is further analysis of an A/B test, or any randomized experiment to asses whether random chance is a reasonable explanation for the observed difference between groups A and B.other

Answer 20

Hypothesis tests use the following logic: "given the human tendency to react to unusual but RANDOM behavior and interpret it as something meaningful and real, in our experiments we will require PROOF that the difference between groups is more EXTREME than what CHANCE MIGHT PRODUCE." This involves a baseline assumption that the treatments are equivalent, and ANY DIFFERENCE BETWEEN THE TWO GROUPS IS DUE TO CHANCE. This baseline assumption is termed the NULL HYPOTHESIS. Our hope is then that we can, in fact, prove the null hypothesis WRONG, and show that the outcome for groups A and B are MORE DIFFERENT THAN WHAT CHANCE MIGHT PRODUCE. One way to to do this is via a RESAMPLING PERMUTATION procedure, in which we shuffle together result from A,B and then repeatedly deal out the data in groups of similar sizes, then observe HOW OFTEN we get a difference AS EXTREME as the observed difference.

Answer 21

Hypothesis tests by their nature involve nit just a null hypothesis, but also and OFFSETTING ALTERNATIVE hypothesis. e.g., Null: "no diff xbar_a vs. xbar_b: Alt: "A is different than B" (could be bigger or smaller) Null: "A <= B" Alt: : "A>B" Null: "B is not x% greater than A" Alt : "B is x% greater than A"

Answer 22

In A/B testing, we test a new option (B) vs. an established option (A) and the presumption is that we will keep with A unless B proves to be significantly better. In such case, we want a hyp test to protect against being FOOLED BY CHANCE IN THE DIRECTION OF B. We don't care about being fooled in the direction of A because we'd be sticking with A unless B poves significantly better. So we want a DIRECTIONAL ALTERNATIVE hypothesis (B is better than A) and we use a ONE-WAY (one tail) hyp test. This means that extreme chance results in only one direction count towards the p-value.

Answer 23

If we want a hyp test to protect us from being fooled by chance in either direction, the alt hyp is BIDIRECTIONAL (A is different from B; either bigger or smaller). In such cases, we use a TWO-WAY (2 tail) hypothesis. This means that EXTREME CHANCE RESULTS in either direction count towards the p-value.

Answer 24

stat significance is how statisticians measure an experiment yields a result MORE EXTREME THAN WHAT CHANCE MIGHT PRODUCE. If the result is BEYOND the realm of CHANCE VARIATION, it is said to be STATISTICALLY SIGNIFICANT. e.g. say price A converts custs almost 5% better than price B (.8425% vs. .8057%, a diff of .0368 pct pts) and we have 45k obs. We can test whether the diff in conversions of A vs. B is within the realm of CHANCE VARIATION, using a resampling procedure to simulate real-world events: 1. create an urn with all sample results: 382 ones, 45945 zeros; .008246 conversion rate 2. shuffle and draw a resample of size 23,739 (same n as price A) and record the count of 1s. 3. Record the number of 1s in the remaining 22,588 (same n as price B). 4. Record the diff in proportion 1s. 5. Repeat steps 2 to 4. 6. How often was the difference >= 0.0368? We can plot a hist of, say 1000, resampling outcomes above and see that the observed diff .0368% falls well within the range of chance variation.

Answer 25

p-value is the FREQUENCY with which the CHANCE MODEL produces a result MORE EXTREME than the OBSERVED RESULT. We can estimate the p-value from a permutation test by taking the PROPORTION OF TIMES that the permutation test produces a difference EQUAL TO OR GREATER than the OBSERVED difference. A p-value of .308 means that we would EXPECT to achieve a result AS EXTREME, OR MORE EXTREME than this observed outcome BY RANDOM CHANCE 30.8% of the time. Instead of a permutation test, since a binary outcome experiment is binomially distributed, we can APPROXIMATE the BINOMIAL DISTRIBUTION by the NORMAL distribution

Answer 26

In assessing stat significance, two types of errors are possible: Type I: we MISTAKENLY conclude an effect is REAL, when it is really DUE TO CHANCE Type II: we MISTAKENLY conclude and effect is NOT REAL, when in fact it IS REAL Recall the basic function of significance (hypothesis) tests is to protect against BEING FOOLED BY RANDOM CHANCE; thus these test are typically structured to MINIMIZE Type 1 errors.

Answer 27

In the 1920s when stat tests were being developed, it was INFEASIBLE to do a resampling test (1000s of shuffled iterations). Statisticians found a good APPROXIMATION to the shuffled permutation was the T-TEST. t stat = (xbar - mu) / sqrt(s^2/n) xbar is sample mean, mu is pop mean, s^2 is sample var, with n-1 df

Answer 28

Since the standard dev is in the same units as the original data, we can draw it on the graph of the sample observations. Dividing the variance by n-1 compensates for the fact that we are calculating differences (deviations) from the SAMPLE mean instead of the population mean....otherwise, we would systematically UNDERESTIMATE the variance around the population mean.

Answer 29

Given a sampling distribution, say by bootstrap 100x, we obtain 100 sample means. A confidence interval is just an interval that COVERS 95% of the SAMPLE MEANS. That's IT. What is the point of CIs? CIs are statistical tests PERFORMED VISUALLY. Because 95% CI covers 95% of the sampling stat, we know that anything outside of the CI OCCURS < 5% of the time. i. e. the sample region OUTSIDE the CI must have a probability of occurring < 5%. i. e. the P-VALUE of any sample OUTSIDE of the CI is < 0.05 (and thus, significantly different than the TRUE MEAN). Say we take weight samples between male and female mice and find their 95% CIs do NOT overlap. Then we can state that weights of male and female mice are STATISTICALLY different.

Answer 30

Say we have data: target y = mice weight variable x = mouse size Now compute target mean ybar, and compute: SST = sum(yi - ybar)^2 And then fit regression line yhat and compute: ``` SSR = sum(yi - yhati)^2 SSReg = sum(yhati - ybar)^2 ``` R2 = Var(mean) - Var(fitted line) /Var(mean) R2 QUANTIFIES the DIFFERENCE between the regression LINE and the target MEAN ybar: R2 = 1 - (SSR/SST) Thus, a PERFECTLY FITTED regression line will MINIMIZE SSR, s.t. R-squared will be MAXIMIZED toward 1. e.g. Var(mean) = 32, Var(fitted line) = 6 R2 = (32-6) / 32 = (26/32) = 0.81 i.e. there is 81% LESS VARIATION around the fitted line than NAIVE mean ybar OR the x,y RELATIONSHIP in the model ACCOUNTS for 81% of the VARIATION.

Answer 31

R2 is easier to interpret than correl r. e.g. how much better is r = .7 than r = .5? just convert r to R2 =: R2 = .7^2 = 0.50 so 50% of orig variation is explained R2 = 0.5^2 = 0.25 so 25% or orig variation is explained Thus, with R2 it is EASY to see that the first correl is TWICE as GOOD as the second correl.

Answer 32

A p-value is the probability that random chance COULD HAVE generated the OUTCOME in QUESTION, OR OBSERVING AN OUTCOME EQUAL OR MORE EXTREME. For flipping a coin 5x, there are 32 total outcomes. There is ONE outcome with 5 heads: HHHHH with proba 1/32 and there is one outcome EQUALLY AS EXTREME as 5 heads, which is 5 tails TTTTT, also with proba 1/32 so P-VALUE of flipping 5H is (1/32 + 1/32) = 1/16 = .0625 Notice: such an "extreme" event of HHHHH or TTTTT is not alpha < .05!

Answer 33

Given a data distribution, the goal of max likelihood is to fit the OPTIMAL distribution to the DATA. Why do we care to fit a dist? Because we may have a hunch that the data is say a normal dist, which would entail that the data is centered around the mean by some std. BUT WHICH is the CENTER of the data? The normal dist says that most data pts should be NEAR the CENTER of the dist. Say we place the normal curve centered over the lhs tail in the dataset s.t. the rhs part of curve falls onto the actual mean of the data pts. Then the norm dist says that the PROBABILITY (i.e. LIKELIHOOD) of observing the pts under its rhs tail is LOW, while in this case, the pts under the right tail is ACTUALLY where most of the pts exist. We can plot the likelihood on y vs. location of dist center. We want the MAXIMUM LIKELIHOOD of that plot. Thus, the normal dist curve centered over the actual sample mean is the MAXIMUM LIKELIHOOD ESTIMATE for the MEAN. So a MAX LIKELIHOOD for a particular sample stat is the STATISTIC that MAXIMIZES the LIKELIHOOD (probability) that we observed the stats we observed. Probability and LIKELIHOOD are the same idea, but in this statistical context, proba is called LIKELIHOOD. Summary, LIKELIHOOD is how we FIT a DISTRIBUTION to DATA.

Answer 34

Cov = Sum(xi - xbar)(yi - ybar) /(n-1) Cov can classify three types of relationships: 1. negative trends (cov < 0) 2. positive trends (cov >0) 3. no trend (cov = 0) When the Cov is positive, it tells us the SLOPE of the relationship between X,Y is POSITIVE, i.e. we CLASSIFY the TREND as POSITIVE. The Cov does NOT tell us the STRENGTH of the relationship. However, cov on its own is not interesting: Covariance is a computational STEPPING STONE to something more interesting, like CORRELATION and PCA.

Answer 35

Covariance is sensitive to the scale of the data, which makes it difficult to interpret. The sensitivity to scale also prevents the cov value from telling us if the data are close (on) the line that represents the relationship, or scattered far from the line. When the data pts are FAR from the line fitting X,Y then cov is LARGER.

Answer 36

A (standard) z-score is the signed number of standard deviations by which the value on an obs or data point is ABOVE the MEAN value of what is being measured. It is calculated by subtracting the population MEAN from an individual raw score, then diving this difference by the POPULATION standard deviation: z = (x - mu) / sigma This conversion process is called STANDARDIZING or NORMALIZING. Computing a z-score requires knowing the mean and the std of the COMPLETE POPULATION to which that data point belongs; if one only has a SAMPLE of obs from the population, then the analogous computation with SAMPLE mean and std yields the t-statistic.

Answer 37

variable of interest is web link clicks among visitors to site, so a proportion p. ``` X = {number of users who clicked on link} N = number of users ``` phat = X/N say phat = 100/1000 = 0.1 A good rule of thumb to assume normal dist approximation for our sample is whether N*phat > 5. Since N*phat ~= 100, we can assume normality. the margin m from a mean, or statistic phat, at a specified alpha/2 level is: ``` m = z_alpha/2 *STD) for known population std m = z_alpha/2 *(sigma/sqrt(n) ) for known population std ``` ``` m = t_alpha/2 *SE) for unknown population std. m = t_alpha/2 *(s/sqrt(n) ) for unknown population std. ``` then a CI at alpha significance is: (xbar - m, xbar +m) e.g. an alpha .05 CI with know pop std and mean = 0 is: z_alpha/2 = 1.96 var of a binom proportion is (pq)/n var = (.10*.90)/1000 = (.09/1000) var = .00009 std = sqrt(var) = se = .009487 m = 1.96*(se) = 1.96*.009487 m = .0186 m ~=.019 CI = (-.081, .119)

Answer 38

z score at (alpha_.01/2 = alpha_.005) note: alpha level is like a p value, where p value = P(z>std) = 1-P(z

Answer 39

A hypothesis test is a QUANTITATIVE way to establish how LIKELY it is that your results OCCURRED by CHANCE. First we establish a BASELINE NULL hypothesis that there is NO difference between baseline CONTROL and alternative EXPERIMENT. Then establish an ALTERNATIVE hypothesis which specifies some DIFFERENCE between NULL. Fundamentally, the sampling distributions from the NULL control group and the experiment group will each exhibit normal distributions. We can check for DIFFERENCES in SAMPLING DISTRIBUTION means from each group, by checking whether the group MEANS are statistically different, which we can apply p value that OBSERVED differences are MORE extreme than normal distribution RANDOM CHANCE suggests.

Answer 40

The standard error (SE) of a test statistic (usually an estimate of a parameter) is the STANDARD DEVIATION of its SAMPLING DISTRIBUTION, or an estimate of that std. A sampling distribution of a population mean is generated by repeated SAMPLING and RECORDING of means observed. This forms a distribution of DIFFERENT means and THIS DISTRIBUTION has its OWN mean, variance. Mathematically, the var of the sampling dist obtained is equal to the VARIANCE of the population divided by the SAMPLE SIZE. This is because as the sample size increases, sample mean CLUSTERS more CLOSELY around the MEAN. Thus, the relationship between the STANDARD ERROR and the standard deviation is such that, FOR a GIVEN SAMPLE SIZE, the standard error equals the standard dev DIVIDED by the SAMPLE SIZE. In other words, the STANDARD ERROR is a measure of DISPERSION of SAMPLE mean around the population mean. The SE of the population mean is: sigma_xbar = sigma/sqrt(n) But since the population std is seldom known, the SE of the mean is usually ESTIMATED as teh SAMPLE std divided by sqrt(sample size): sigma_xbar ~= s/sqrt(n) where s is SAMPLE std. The std of the SAMPLE mean is equivalent to the std of the ERROR in the SAMPLE mean with respect to the true mean, since the sample mean is an UNBIASED estimator.

Answer 41

Type I error is the REJECTION of a TRUE NULL (conclude a FALSE POSITIVE). Type II error is ACCEPTANCE of FALSE NULL (conclude a FALSE NEGATIVE). FALSE means the test conclusion drawn is INCORRECT. A type I error leads to the conclusion that a supposed ALTERNATIVE EFFECT EXISTS when it in fact doesn't (e.g. like CRYING WOLF, conclude patient has cancer when she doesn't, fire alarm sounds but there is NO FIRE). A type II error leads to the conclusion that the ALTERNATIVE EFFECT does NOT exist, when in fact IT DOES (e.g. TEST fails to WORK, conclude patient does NOT HAVE cancer when she DOES, alarm does NOT sound but FIRE EXISTS).

Statistics Flashcards

technical interview study (65 cards)