AB Testing Flashcards by Hsueh Lin Chen

Sample Size Equation

Z-value * pop w/ feature * pop w/o feature / margin error ^ 2

How well did you know this?

Not at all

Perfectly

Type I Error

The Null Hypothesis should be accepted but is rejected

How well did you know this?

Not at all

Perfectly

Type II Error

The Null Hypothesis should be rejected but is accepted

How well did you know this?

Not at all

Perfectly

Confidence Interval

Confidence of not making a Type I Error, Common value is 90% - 95%

How well did you know this?

Not at all

Perfectly

Null Hypothesis

Hypothesis that control and treatment having the same impact or disapproval of the testing feature

How well did you know this?

Not at all

Perfectly

Statistical Power

Probability of finding a statistically significant results when Null Hypothesis is false.

How well did you know this?

Not at all

Perfectly

Rejecting the Null Hypothesis

Creates statistically significant result for the tested feature, that there is a difference between the treatment and the control.

How well did you know this?

Not at all

Perfectly

Relationship between CI and Test Sample

If you want higher CI, you will need a larger sample size.

How well did you know this?

Not at all

Perfectly

Definition of Power

Formula to figure out the chance for the null hypothesis to be rejected, bigger the sample size, generally bigger the power. Power = 1 - Beta where beta is the type II error

How well did you know this?

Not at all

Perfectly

Alpha - Power

p - value, this is the value that determines the likelihood for your feature data to have a type I error. The likelihood or p-value needs to be below the determined CI in order for us to reject the null hypothesis.

How well did you know this?

Not at all

Perfectly

Beta - Power

beta is the type II error or failure to account for the effect of the feature in the population sample.

How well did you know this?

Not at all

Perfectly

Assumptions of Power

Generally, the Beta will tell us the chance that the feature is ignored in the sample set [beta = .2, 20% of the time the feature is missed, Power will be .8 in this case]

How well did you know this?

Not at all

Perfectly

Jacob Cohen

States that for most researchers, type I error is about 4 times more significant than type II errors

How well did you know this?

Not at all

Perfectly

General practice for Power

Generally a power of .8 is enough for the experiment. As the requirement for bigger sample to increased CI is exponential.

How well did you know this?

Not at all

Perfectly

Critical value

t-statistic, this will let us know the degree of freedom and how much type II error is present within the dataset

How well did you know this?

Not at all

Perfectly

6 pillars of Power Analysis

Study These Flashcards

Difference of the biological interest
Variability (std) of the data
desired Power of the experiment (.8) or 1 - Beta
Significant Level (.05) or alpha
Sample Size or Power Forumula
Alternative Hypothesis (one sided and two sized test)

What is the difference of biological interest?

Study These Flashcards

it is the minimum meaningful effect of biological relevance. Or how effects of the experiment is measured. The smaller the effect size, the smaller the need for sample size

We determine this through previous research or pilot studies.

what is the variability or Standard Deviation of the data?

Study These Flashcards

the variability of the data is how likely the population we sample is random.

to determine this, we use Data from previously related research on the sample population or the baseline of the sample population

Significant Level

Study These Flashcards

the Alpha of the experiment or often times the critical values (t-statistic or p-value) based on the size of the sample size.

One-tailed test

Study These Flashcards

One-tailed test will have the significant value to be closer to 0 and is easier for the data to be significant.

also known as direction test.

Two-tailed test

Study These Flashcards

Two-tailed test will have the significant value to be closer to 0 and is harder for the data to be significant.

also known as non-directional test.

Difference between directional and non-directional test and use case.

Study These Flashcards

the difference between the test is the way the type of statistics used. t-distribution is for non-directional while the f distribution is for directional.
you can use the directional test if you are only interested in one quantity over another rather than general difference.

Difference between directional and non-directional test and use case.

Study These Flashcards

the difference between the test is the way the type of statistics used. t-distribution is for non-directional while the f distribution is for directional.
you can use the directional test if you are only interested in one quantity over another rather than general difference.

IE, if you are interested in group A testing higher than Group B, then you can use directional test. If you are looking for the differences between the test score of Group A and Group B, then you should use the non-directional test.

non-directional is harder to be significant.

Difference between t-statistic and z-statistic

Study These Flashcards

If you don’t have information on the population sample such as mean, std, sample mean, sample size. Then you use the t-statistic.

you can use the z-statistic if you have those information

what is the p-value and how is it derived

the likelihood that your results happened by chance. it is derived from the t-statistics if the tested feature is significant and the likelihood of it being a fluke is low (p-value < .05) then the feature is likely to be significant

What is one sample t-test and what is an example of it?

the most common type of t-statistic test. You are testing the means of your sample population against the average. IE. Compare the SAT score of your tutor class to the average SAT score to a broader population.

what is the paired t-test and what is an example of it?

The paired t-test compares the mean between the same sample size over a feature. Good to measure the effect of the feature on the same population over time or some other fixed variable IE. Test if a new learning program is raising the SAT score of your population. Two features: time and learning program.

AB Testing Flashcards

(27 cards)