Data Analysis IIa: ANOVA & Regression (Week 6) Flashcards
What is ANOVA?
To test more than 2 means (i.e. >2 groups)
Do groups 1, 2 & 3 have sig. different means for x̄?
What is the formula for F-test?
F = Between-grp variance/Within-grp variance
= Σnj (x̄j - x̄)^2 / (k-1) / ΣΣ (x- x̄j)^2 / (N-k)
What are the degrees of freedom for F-test?
df1 = k-1 df2 = N-k
What happens when we reject the null hypothesis for the F-test?
Null hypothesis: All groups have the same mean
Reject Ho -> Not all means are the same.
Which one differs? Conduct post-hoc.
What is an example of post-hoc tests?
To find out which means differ from each other
E.g. LSD
Comparable to a large set of t-tests
What is regression?
Calculate the distance from the observation to the fitted line
Regression MINIMISES the sum of these differences
What is sum of squares?
Sum of squares of distances from data pts to fitted line
We prefer the reg. line that gives the LOWEST sum of squares
What is the regression equation?
yi = α + β xi + εi
α: intercept/constant
β: slope
ε: disturbance/error term
How does α and β affect the graph?
Higher α -> Parallel shift of graph (affects INTERCEPT)
Higher β -> Steeper graph (affects SLOPE)
What is simple vs. multiple regression?
Simple: yi = α + β xi + εi
Multiple: yi = α + β1 x1i + β2 x2i + β3 x3i + εi
Why do we not do multiple simple regressions?
We want to test the effect of multiple variables AT THE SAME TIME
What is omitted variable bias?
Eg. Salesi = α + β1 Pricei + β2 Advertisingi + εi
If we omit price, the effect of advertising is not clean
What are the regression coefficients of the regression equation?
α, β1, β2
How do we choose which IVs to include in our regression equation?
- Use theory/intuition
- Do not just include all variables in your dataset
- For exploratory research: use stepwise regression
What are the 3 steps to interpret regression results?
- Model significance: F-test
- Model fit: R^2
- Regression coefficients: Significance, sign, size
What is the F-test and the hypotheses?
To assess the significance of our model
Null hypothesis: All coefficients are 0
i.e. β1 = 0, β2 = 0
Alt. hypothesis: At least ONE coefficient is non-zero
i.e. β1 ≠ 0 or β2 ≠ 0
What is R^2?
R^2 indicates the proportion of variance in the DV that is explained by the IVs
Varies between 0 and 1
i.e. 0% / 100% of the variance explained
What is the R^2 in a simple regression?
Simple regression: Only 1 IV & a constant
The R^2 is equal to the square of the correlation coefficient
E.g. Correlation of 0.8 b/w X & Y
R^2 =0.8^2 = 0.64
How is R^2 affected when we add a variable?
R^2 will always increase or stay the same
Not possible for R^2 to decrease by adding more variables
If β2=0, it just reduces to the original equation with β1
What is adjusted R^2 and when is it used?
- We want to keep our model as compact as possible
- Adjusted R^2 gives a penalty for additional variable
- Adjusted R^2 only increases when the better fit outweighs the cost of having an additional coefficient
- Only used when >2 alt. models with diff. no, of variables
How do we test for significance of the regression coefficients?
Look at t-test & p-value
t-value = coefficient / standard error
p-value < 0.05
What is coefficient interpretation?
ONLY interpret if SIGNIFICANT
When Y increases by 1 unit, Xi changes by β1 units
What are fitted values?
We can use our model to calculate fitted values
And compare fitted values to observed values
How can fitted values be used?
Used to run scenarios - E.g. Given a price & advertising level, what would sales be?
Forecasting - Predict in the future, other brands, other respondents etc
Can we interpret regression coefficients as causal r/s?
No. Regression is based on ASSOCIATION between variables.
For causality, all 3 conditions need to be satisfied:
Needs experimentation and manipulation.