Regression Flashcards

Question

Z-Score: Left-Tailed Test

Answer 1

- For a normal distribution, the probability of getting a value less than your z-score of -2.82 is calculated by taking the area under the curve to the left of the z-score. - This is called a **left-tailed test** because your**p-value is located on the left tail** of the distribution. - The area under this part of the curve is the same as your **p-value**

Answer 2

- For a normal distribution, the probability of getting a value less than your z-score of 2.82 is calculated by taking the area under the curve to the left of the z-score. - This is called a **right-tailed test** because your**p-value is located on the right tail** of the distribution. - The area under this part of the curve is the same as your **p-value**

Answer 3

0.0023 or 0.23 % This means there's a 0.23 percent probability that the difference in mean delivery time would be 2 minutes or greater if the null hypothesis is true. In other words, it's highly unlikely that their difference is due to chance.

Answer 4

- A two-sample test determines whether or not two means are equal to each other. - the **standard approach** for comparing two means.

Answer 5

- The two samples are **independent** of each other. - For each sample, the data is drawn randomly from a **normally distributed** population. - The population standard deviation is **unknown**.

Answer 6

The graph of the t-distribution has a bell shape that is similar to the standard normal distribution, but the t-distribution has **bigger tails** than the standard normal distribution does. The bigger tails indicate the **higher frequency of outliers** that come with small dataset. As the **sample size increases**, the **t-distribution** approaches the **normal distribution.**

Answer 7

Because you're interested in values in **both directions**, either less than or greater than your test statistic, your p-value is the probability of getting a value less than the T-score -1.2508 or greater than the T-score positive 1.2508. Your **p-value** corresponds to the area under the curve on both the left tail and the right tail of the distribution.

Answer 8

- In a **left-tailed** test, Ha < 30 (mean weight of the penguin population is less than 30 lbs. - In a **right-tailed test**, Ha < 30 (mean weight of the penguin population is greater than 30 lbs.) - In a **two-tailed test**, Ha ≠ 30 (mean weight of the penguin population is not equal to 30 lbs.

Answer 9

compare two versions of something to find out which version performs better.

Answer 10

Data professionals use A/B testing to help business leaders - optimize product performance, - improve customer experience, and - grow their online business.

Answer 11

1. Test design 2. Sampling 3. Hypothesis testing

Answer 12

In a **randomized controlled experiment**, test subjects are **randomly assigned** to a **control** group and a **treatment group**. - The **treatment** is the new change being tested in the experiment. - The **control group** is not exposed to the treatment. - The **treatment group** is exposed to the treatment. - The difference in metric values between the two groups measures the treatment’s effect on the test subjects.

Answer 13

Random selection helps you create a representative sample that reflects the characteristics of the overall user population. - choose a appropriate **sample size**for your A/B test. - The larger the sample size = more precise the results, and the more likely you’ll get results that are **statistically significant** when there is a **difference between** group A and group B. - However, can be **expensive and time-consuming**.

Answer 14

- **H**0: There is no difference in average revenue per user between A and B - **H**a: There is a difference in average revenue per user between A and B

Answer 15

refers to planning an experiment in order to collect data to answer your research question.

Answer 16

1. Define your variables 2. Formulate your hypothesis 3. Assign test subjects to treatment and control groups

Answer 17

defining the independent and dependent variables example - **independent variable** is the medicine—the cause you want to investigate. - *dependent variable** is recovery time—the effect you want to measure.

Answer 18

refers to the cause you’re interested in investigating. A researcher changes or controls the independent variable to determine how it affects the dependent variable.

Answer 19

refers to the effect you’re interested in measuring. “Dependent” means its value is influenced by the independent variable.

Answer 20

- H0 is that the medicine has **no effect.** - Ha is that the medicine **is effective.**

Answer 21

In a **randomized controlled experiment**, test subjects are **randomly assigned** to a **control** group and a **treatment group**. - The **treatment** is the new change being tested in the experiment. - The **control group** is not exposed to the treatment. - The **treatment group** is exposed to the treatment. - The difference in metric values between the two groups measures the treatment’s effect on the test subjects.

Answer 22

helps control the effect of other factors on the outcome of an experiment.

Answer 23

1. completely randomized design 2. randomized block design

Answer 24

test subjects are assigned to treatment and control groups using a random process.

Answer 25

1. Relatively**low accuracy** due to lack of restrictions which allows environmental variation (nuisance) to enter experimental error. 2. **Not** suited for** large numbers** of treatments because a relatively large amount of experimental material is needed which increases the variation.

Answer 26

minimize the impact of known nuisance factors. - **Blocking** is the arranging of test **subjects in groups**, or blocks, that are similar to one another. - then **randomly assign** the subjects within each block to treatment and control groups.

Answer 27

hypothesis tests are used to see **significant differences among groups**. Chi-squared tests are used to determine whether one or more observed **categorical variables** follow expected distribution(s)

Answer 28

used to determine whether one or more observed categorical variables follow expected distribution(s).

Answer 29

1. Goodness of Fit 2. Test for Independence

Answer 30

is a hypothesis test that determines whether an observed **categorical variable follows an expected distribution.** H0: The week you observed follows your boss’s expectations that the number of website visitors is equal on any given day Ha: The week you observed does not follow your boss’s expectations; therefore, the number of website visitors is not equal across the days of the week

Answer 31

1. Identify the Null and Alternative Hypotheses 2. Calculate the chi-square test statistic (X2) 3. Calculate the p-value 4. Make a conclusion

Answer 32

is a hypothesis test that determines whether or not **two categorical variables are associated**with each other. H0: The type of device a website visitor uses to visit the website is independent of the visitor’s membership status. Ha: The type of device a website visitor uses to visit the website is not independent of the visitor’s membership status.

Answer 33

statistical technique used to check if the means of two or more groups are significantly different from each other - compares **continuous** variables with **categorical** variables can be **applied to the results** we get from a **linear regression**.

Answer 34

Regression: IF and by HOW MUCH variables impact an outcome variable ANOVA: Pairwise comparisions. Understance nuance among elements that fueld regression analysis

Answer 35

Compares the means of **one continuous dependent variable** based on three or more groups of **one categorical variable**.

Answer 36

Compares the means of **one continuous dependent variable** based on three or more groups of **two categorical variables**.

Answer 37

Performs a pairwise comparison between all available groups while controlling for the error rate. - If we run **multiple hypothesis tests** all with a 95% confidence level, there is an **increasing chance** of a **false positive**. - The **post hoc test** will control for this, and allows us to run **many hypothesis tests** while **remaining confident with the accuracy** of the results.

Answer 38

Since the p-value is very small and we ANOVA tells us we can reject the null hypothesis that the mean price is the same for all diamond color grades but it doesnt specify which color and which price is different. post hoc test is useful because the one-way ANOVA does not tell us which colors are associated with different prices. The post hoc test will give us more information.

Answer 39

1. Build a simple linear regression model 2. Check the results 3. Run one-way ANOVA

Answer 40

is a statistical technique that test the difference of means between three or more groups while controlling for the effects of covariance.

Answer 41

- Covariates are the variables that are not of direct interest to the question we are trying to address. - By taking the covariates into account, we can **better isolate** the relationship between the categorical variable we are interested in and the Y variable. - This allows us to draw more accurate conclusions about the relationships among the variables.

Answer 42

- manova or multi variant analysis of variance - that compares how two, or more continuous outcome variables, vary according to categorical independent variables. The independent variable must be **categorical** and the outcome variables must be **continuous**

Regression Flashcards

(66 cards)