lecture 4 (1) Flashcards
Main point of an experiment
Test some idea
Not proven something
most experiments “fail”
I.e. change does not lead to an improvement
This is a good thing
Bad ideas fail quickly
Investment is typically small
As are the sample sizes
But failed experiments are not normally what one sees
In publications or when talking to a firm
Steps to analyze experimental data
build an understanding of the data structure
COmpute some descriptive statistics
Visualize the data
Run (the correct) statistical test
(use test results to inform decision making)
What descriptive statistics do you want to know
How many observations in total
How many observations many per treatment
What is the mean/median number of sales per store in each treatment group
What is the standard deviation of the number of sales pers store in each treatment group
Do observable characteristics of stores differ across treatments
How to best visualize differences between promotions
Histogram/bar plot
Scatterplot
Boxplot
what is the right statistical analysis to run?
two sample tests of means
Limit to binary comparisons
Two sample test of proportions
ANOVA
Linear Regression
Comparing means - two alternatives
Form null and alternative hypothesis
H0: u1-u2=0
HA:u1-u2 != 0
set a significance level alpha = 0.05
Test statistic (assuming unequal variances)
tstat= …
Intermezzo: type 1 and type 2 errors
Type 1 error: False positive
Type 2 error: False negative
what if you want to compare all treatments
need anova
Anova assumptions
independence of errors
constant variance
Normality of errors
Of these (2) is the most important
Homoskedasticity
Assuming errors are normally distributed, tested via bartletts test
Bartletts test
assumes normality of errors
If non normal Brown forysth test
If non constant variance choranes test
Barlett test explained
null hypothesis: Variances are equal across treatments
two promotion comparison?
use regression
why only an estimate of promotion 2 and not also promotion 1
How to interpret this regression
When a constant is included in the regression 1 categorical variable must be left out
We have two categories since we have two treatments (promotion 1 and promotion 2)
beta0 is the average revenue for stores who were in promotion 1
beta0+beta1 is the average revenue for stores who were in promotion 2
beta1 is the average difference in revenues between promotion 2 and promotion 1
Regression estimates from analysis of experiments have causal interpretations, why
Counterfactual outcomes -compare to an alternative promotion
As good as random assignment to treatments - lurking variables wont trouble us
No sample selection bias… analyst picked the sample to match the group they care about
Regression estimates from experiments allow us to
Test whether treatments have effect
Same as ANOVA or a T-test
Estimate a magnitude of the effect sizes (and standard errors)
Which our T-test and anova didnt