11: A/B Testing Flashcards
What is A/B Testing
- simple and controlled experiment
- randomly split traffic between 2(or more) versions
- A. control, existing system
- B. treatment, new version - collect metrics of interest(dependant variable)
- analyse data
- run statistical test to confirm difference not by chance
- best scientific way to prove causality
Give two examples of A/B testing and explain the variables
Facebook
Goal: encourage people to open up information they are comfortable to share
independent variables:
- version 1: broad set of choices with preset ‘recommended’
- version 2: settings grouped into smaller set of features, preset recommended
- version 3: similar to version 1 but with skip for now
- version 4: similar to version 2 but with skip for now
- version 5: similar to version 2 but without any presets
dependent variables: user preferences, amount of users that open up information
result: users preferred version 5, give control to the users
Amazon Goal: get users to buy more items independent variables: - version 1: old webpage - version 2: after add items to basket, user presented with "users who bought xx also bought xx" dependent variables: amount of sales result: version 2 was very successful
Explain the concept of user assignment in A/B testing
- good randomisation
- consistent assignment
- independent assignment
- monotonic ramp-up
- as experiment is ramped up, user who are exposed to treatment must say in those treatments
What is Overall Evaluation Criterion OEC
- long term metric that company really cares about
- time on site
- visit frequency - using short term metrics that predict long term value
- optimise for customer lifetime value
- determines whether to launch treatment
- if experiment is negative, relook at metrics
What is ramp up and auto abort
- start experiment at 0.1%
- run simple analysis to make sure no problems
- ramp up to higher % and repeat till 50%
- detecting big difference is easy
- detecting 10% requires only a small sample
- detecting 0.1% is hard, run 50/50 for a longer time - abort the experiment if treatment is significantly worse
A/B testing advantages
- test of causal relationships, not just correlation
- reduce effect of external factors
- ease of test design and scalability
- decide on number of versions
- split available traffic among versions
- test a range of alternatives - measures user actual behaviour
- ease of implementation
A/B testing disadvantages
- needs to agree on OEC
- requires clear goal
- needs to define independent, dependent variables - problem with quantitative metrics
- does not tell why A is better than B
- needs to have same subjective measures - primary effect
- changing app may degrade user experience regardless of which one is better
- takes time to get used to - consistency contamination
- assignment is cookie based.
- users may erase cookies or use a different machine - multiple experiments
- statistical variance increase, harder to get statistically significant result - outlier detection
- 5-40% are bots
- show skewed results - ethics
- emotional manipulation