AB Testing Flashcards
Sample Size Equation
Z-value * pop w/ feature * pop w/o feature / margin error ^ 2
Type I Error
The Null Hypothesis should be accepted but is rejected
Type II Error
The Null Hypothesis should be rejected but is accepted
Confidence Interval
Confidence of not making a Type I Error, Common value is 90% - 95%
Null Hypothesis
Hypothesis that control and treatment having the same impact or disapproval of the testing feature
Statistical Power
Probability of finding a statistically significant results when Null Hypothesis is false.
Rejecting the Null Hypothesis
Creates statistically significant result for the tested feature, that there is a difference between the treatment and the control.
Relationship between CI and Test Sample
If you want higher CI, you will need a larger sample size.
Definition of Power
Formula to figure out the chance for the null hypothesis to be rejected, bigger the sample size, generally bigger the power. Power = 1 - Beta where beta is the type II error
Alpha - Power
p - value, this is the value that determines the likelihood for your feature data to have a type I error. The likelihood or p-value needs to be below the determined CI in order for us to reject the null hypothesis.
Beta - Power
beta is the type II error or failure to account for the effect of the feature in the population sample.
Assumptions of Power
Generally, the Beta will tell us the chance that the feature is ignored in the sample set [beta = .2, 20% of the time the feature is missed, Power will be .8 in this case]
Jacob Cohen
States that for most researchers, type I error is about 4 times more significant than type II errors
General practice for Power
Generally a power of .8 is enough for the experiment. As the requirement for bigger sample to increased CI is exponential.
Critical value
t-statistic, this will let us know the degree of freedom and how much type II error is present within the dataset
6 pillars of Power Analysis
- Difference of the biological interest
- Variability (std) of the data
- desired Power of the experiment (.8) or 1 - Beta
- Significant Level (.05) or alpha
- Sample Size or Power Forumula
- Alternative Hypothesis (one sided and two sized test)
What is the difference of biological interest?
it is the minimum meaningful effect of biological relevance. Or how effects of the experiment is measured. The smaller the effect size, the smaller the need for sample size
We determine this through previous research or pilot studies.
what is the variability or Standard Deviation of the data?
the variability of the data is how likely the population we sample is random.
to determine this, we use Data from previously related research on the sample population or the baseline of the sample population
Significant Level
the Alpha of the experiment or often times the critical values (t-statistic or p-value) based on the size of the sample size.
One-tailed test
One-tailed test will have the significant value to be closer to 0 and is easier for the data to be significant.
also known as direction test.
Two-tailed test
Two-tailed test will have the significant value to be closer to 0 and is harder for the data to be significant.
also known as non-directional test.
Difference between directional and non-directional test and use case.
- the difference between the test is the way the type of statistics used. t-distribution is for non-directional while the f distribution is for directional.
- you can use the directional test if you are only interested in one quantity over another rather than general difference.
Difference between directional and non-directional test and use case.
- the difference between the test is the way the type of statistics used. t-distribution is for non-directional while the f distribution is for directional.
- you can use the directional test if you are only interested in one quantity over another rather than general difference.
IE, if you are interested in group A testing higher than Group B, then you can use directional test. If you are looking for the differences between the test score of Group A and Group B, then you should use the non-directional test.
non-directional is harder to be significant.
Difference between t-statistic and z-statistic
If you don’t have information on the population sample such as mean, std, sample mean, sample size. Then you use the t-statistic.
you can use the z-statistic if you have those information
what is the p-value and how is it derived
the likelihood that your results happened by chance. it is derived from the t-statistics
if the tested feature is significant and the likelihood of it being a fluke is low (p-value < .05) then the feature is likely to be significant
What is one sample t-test and what is an example of it?
the most common type of t-statistic test. You are testing the means of your sample population against the average.
IE. Compare the SAT score of your tutor class to the average SAT score to a broader population.
what is the paired t-test and what is an example of it?
The paired t-test compares the mean between the same sample size over a feature. Good to measure the effect of the feature on the same population over time or some other fixed variable
IE. Test if a new learning program is raising the SAT score of your population. Two features: time and learning program.