3. Randomized Experiments Flashcards
Run an experiment
- Randomly assign customers to different treatment groups
- Compare differences in behavior among treatment groups
A/B Tests
Divide customers into two groups:
- Control group A
- Treatment group B
Analyze if treatment group behaves differently
Overall Evaluation Criterion (OEC)
OEC is a quantitative measure of the experiment’s objective
Examples:
- Active days per user
- Successful sessions
- Time to success
OEC must be measurable in the short-term yet believed to causally drive long-term strategic objectives.
Parameter
A parameter is a controllable experimental variable that is thought to influence the OEC or other metrics of interest.
- In a simple A/B test there is usually a single parameter with two variables
- But multivariable test are also possible
Randomization unit
A randomization unit is the entity to which the randomization is applied
- you must map units to variants in a persistent and independent manner
- it is common to use users as a randomization unit
- important to ensure that the populations are similar statistically, allowing causal effect to be determined with high probability
Online Shopping Funnel
- Users may not progress linearly through a funnel, but instead skip, repeat or go bach-and-forth between steps
Experiment Design:
Questions to be answered
- What is the randomization unit
- How do we measure success?
- What population of randomization do we want to target?
- How large does our experiment need to be?
- How long do we run the experiment?
How to perform the randomization?
- Pure randomization
-
Stratified
- Randomize within a group of users
- e.g. make sure that all age groups have the same representation
Measuring the impact on the OEC
To measure the impact of the change, we need to define a goal metric -> OEC
One obvious choice might be revenue:
- Revenue per user is preferred to overall revenue as it accounts for the number of users in each condition
OEC: Which users to consider?
It depends on the business context:
- all users who visited the site
- only users who complete the purchase process
- only users who start the purchase process
All three can be right/wrong depending on the context
Central Limit Theorem
The CLT states that the average from a random sample, when standardized, approximates a standard normal distribution, independently of the population distribution.
With the CLT we have information about the distribution of an estimator even without knowing the distribution of the original population.
Hypothesis testing
Hypothesis testing is a method to draw insights from the population, based on a sample.
Steps:
- Formulate a hypothesis about the population
- null hypothesis H0
- alternative hypothesis H1
- Assess how likely the hypothesis is to be true, based on available data
- Reject or fail to reject the null hypothesis
Statistical Power
Statistical power is the probability of detecting a meaningful difference between the variants when there really is one:
–> Reject the null hypothesis when there is a difference.