AB Testing Flashcards

1
Q

Sample Size Equation

A

Z-value * pop w/ feature * pop w/o feature / margin error ^ 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Type I Error

A

The Null Hypothesis should be accepted but is rejected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Type II Error

A

The Null Hypothesis should be rejected but is accepted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Confidence Interval

A

Confidence of not making a Type I Error, Common value is 90% - 95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Null Hypothesis

A

Hypothesis that control and treatment having the same impact or disapproval of the testing feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Statistical Power

A

Probability of finding a statistically significant results when Null Hypothesis is false.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Rejecting the Null Hypothesis

A

Creates statistically significant result for the tested feature, that there is a difference between the treatment and the control.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Relationship between CI and Test Sample

A

If you want higher CI, you will need a larger sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Definition of Power

A

Formula to figure out the chance for the null hypothesis to be rejected, bigger the sample size, generally bigger the power. Power = 1 - Beta where beta is the type II error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Alpha - Power

A

p - value, this is the value that determines the likelihood for your feature data to have a type I error. The likelihood or p-value needs to be below the determined CI in order for us to reject the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Beta - Power

A

beta is the type II error or failure to account for the effect of the feature in the population sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions of Power

A

Generally, the Beta will tell us the chance that the feature is ignored in the sample set [beta = .2, 20% of the time the feature is missed, Power will be .8 in this case]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Jacob Cohen

A

States that for most researchers, type I error is about 4 times more significant than type II errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

General practice for Power

A

Generally a power of .8 is enough for the experiment. As the requirement for bigger sample to increased CI is exponential.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Critical value

A

t-statistic, this will let us know the degree of freedom and how much type II error is present within the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

6 pillars of Power Analysis

A
  1. Difference of the biological interest
  2. Variability (std) of the data
  3. desired Power of the experiment (.8) or 1 - Beta
  4. Significant Level (.05) or alpha
  5. Sample Size or Power Forumula
  6. Alternative Hypothesis (one sided and two sized test)
17
Q

What is the difference of biological interest?

A

it is the minimum meaningful effect of biological relevance. Or how effects of the experiment is measured. The smaller the effect size, the smaller the need for sample size

We determine this through previous research or pilot studies.

18
Q

what is the variability or Standard Deviation of the data?

A

the variability of the data is how likely the population we sample is random.

to determine this, we use Data from previously related research on the sample population or the baseline of the sample population

19
Q

Significant Level

A

the Alpha of the experiment or often times the critical values (t-statistic or p-value) based on the size of the sample size.

20
Q

One-tailed test

A

One-tailed test will have the significant value to be closer to 0 and is easier for the data to be significant.

also known as direction test.

21
Q

Two-tailed test

A

Two-tailed test will have the significant value to be closer to 0 and is harder for the data to be significant.

also known as non-directional test.

22
Q

Difference between directional and non-directional test and use case.

A
  1. the difference between the test is the way the type of statistics used. t-distribution is for non-directional while the f distribution is for directional.
  2. you can use the directional test if you are only interested in one quantity over another rather than general difference.
23
Q

Difference between directional and non-directional test and use case.

A
  1. the difference between the test is the way the type of statistics used. t-distribution is for non-directional while the f distribution is for directional.
  2. you can use the directional test if you are only interested in one quantity over another rather than general difference.

IE, if you are interested in group A testing higher than Group B, then you can use directional test. If you are looking for the differences between the test score of Group A and Group B, then you should use the non-directional test.

non-directional is harder to be significant.

24
Q

Difference between t-statistic and z-statistic

A

If you don’t have information on the population sample such as mean, std, sample mean, sample size. Then you use the t-statistic.

you can use the z-statistic if you have those information

25
Q

what is the p-value and how is it derived

A

the likelihood that your results happened by chance. it is derived from the t-statistics

if the tested feature is significant and the likelihood of it being a fluke is low (p-value < .05) then the feature is likely to be significant

26
Q

What is one sample t-test and what is an example of it?

A

the most common type of t-statistic test. You are testing the means of your sample population against the average.

IE. Compare the SAT score of your tutor class to the average SAT score to a broader population.

27
Q

what is the paired t-test and what is an example of it?

A

The paired t-test compares the mean between the same sample size over a feature. Good to measure the effect of the feature on the same population over time or some other fixed variable

IE. Test if a new learning program is raising the SAT score of your population. Two features: time and learning program.