Kohavi and Thomke (2017): The Surprising Power of Online Experiments Flashcards

Question 1

Q

Key Lessons from Online Experiments
The Value of Controlled Experiments:

Answer

A

Definition: Online experiments (like A/B testing) allow businesses to assess ideas by
comparing a control (current state) with a treatment (proposed change). This scientific method
ensures decisions are evidence-based rather than intuitive.

Question 2

Q

A/B testing for large companies:

Answer

A

Allows to experiment on multiple ideas concurrently at a low cost per test

Question 3

Q

Tiny Changes Can Have a Big Impact:

Answer

A

Contrary to popular belief, progress often comes from implementing numerous small
improvements rather than disruptive changes.

Question 4

Q

The Role of Infrastructure
- Large-scale experimentation requires:

Answer

A

Instrumentation: Collecting data on clicks, interactions, and behaviors.
Data pipelines: For real-time and batch analysis.
Teams of data scientists: To ensure rigor and reliability.

Question 5

Q

Challenges with Experimentation:

Answer

A

Failure Rates: At companies like Google and Bing, only 10%-20% of experiments yield
positive results. This underscores the need for numerous tests to identify breakthroughs.
Complexity and Bugs: Introducing multiple features simultaneously increases the likelihood of
errors. Example: If each new feature has a 10% failure chance, adding 7 features has a >50%
probability of failure.

Question 6

Q

Importance of Data Quality:
- Rigorous Validation:

Answer

A

A/A Tests: Testing a feature against itself ensures systems detect no differences when none
exist.
Identify and exclude outliers (e.g., bots or outlier accounts like libraries on Amazon).

Question 7

Q

Importance of Data Quality:
- Twyman’s Law:

Answer

A

“Any figure that looks interesting is usually wrong.” Surprising results should
be replicated to ensure accuracy.

Question 8

Q

Importance of Data Quality:
- Segment Variability

Answer

A

Some user segments may react differently to experiments, skewing
overall results. For example, a bug in Internet Explorer 7 significantly distorted Bing’s test
results.

Question 9

Q

Avoiding Assumptions About Causality:

Answer

A

Correlation ≠ Causation:
Example: Observational studies in Microsoft Office falsely suggested advanced features
reduced attrition. In reality, heavy users (who use advanced features) naturally have lower
attrition rates.
Controlled Testing Is Essential: Observational studies may misrepresent the impact of
changes.

Question 10

Q

Defining Success with Metrics:

Answer

A

Overall Evaluation Criteria (OEC):
Composite metrics should align with long-term strategic goals (e.g., revenue, engagement).
Example: Bing tracks metrics like tasks completed per session to gauge user satisfaction.
Continuous Refinement:
Successful experiments often result from understanding short- and long-term metric trade-
offs.