Data Analysis IIa: ANOVA & Regression (Week 6) Flashcards

Question 1

Q

What is ANOVA?

Answer

A

To test more than 2 means (i.e. >2 groups)

Do groups 1, 2 & 3 have sig. different means for x̄?

Question 2

Q

What is the formula for F-test?

Answer

A

F = Between-grp variance/Within-grp variance

= Σnj (x̄j - x̄)^2 / (k-1) / ΣΣ (x- x̄j)^2 / (N-k)

Question 3

Q

What are the degrees of freedom for F-test?

Answer

A

df1 = k-1
df2 = N-k

Question 4

Q

What happens when we reject the null hypothesis for the F-test?

Answer

A

Null hypothesis: All groups have the same mean
Reject Ho -> Not all means are the same.

Which one differs? Conduct post-hoc.

Question 5

Q

What is an example of post-hoc tests?

Answer

A

To find out which means differ from each other

E.g. LSD

Comparable to a large set of t-tests

Question 6

Q

What is regression?

Answer

A

Calculate the distance from the observation to the fitted line

Regression MINIMISES the sum of these differences

Question 7

Q

What is sum of squares?

Answer

A

Sum of squares of distances from data pts to fitted line

We prefer the reg. line that gives the LOWEST sum of squares

Question 8

Q

What is the regression equation?

Answer

A

yi = α + β xi + εi

α: intercept/constant
β: slope
ε: disturbance/error term

Question 9

Q

How does α and β affect the graph?

Answer

A

Higher α -> Parallel shift of graph (affects INTERCEPT)

Higher β -> Steeper graph (affects SLOPE)

Question 10

Q

What is simple vs. multiple regression?

Answer

A

Simple: yi = α + β xi + εi

Multiple: yi = α + β1 x1i + β2 x2i + β3 x3i + εi

Question 11

Q

Why do we not do multiple simple regressions?

Answer

A

We want to test the effect of multiple variables AT THE SAME TIME

Question 12

Q

What is omitted variable bias?

Answer

A

Eg. Salesi = α + β1 Pricei + β2 Advertisingi + εi

If we omit price, the effect of advertising is not clean

Question 13

Q

What are the regression coefficients of the regression equation?

Answer

A

α, β1, β2

Question 14

Q

How do we choose which IVs to include in our regression equation?

Answer

A

Use theory/intuition
Do not just include all variables in your dataset
For exploratory research: use stepwise regression

Question 15

Q

What are the 3 steps to interpret regression results?

Answer

A

Model significance: F-test
Model fit: R^2
Regression coefficients: Significance, sign, size

Question 16

Q

What is the F-test and the hypotheses?

Answer

A

To assess the significance of our model

Null hypothesis: All coefficients are 0
i.e. β1 = 0, β2 = 0

Alt. hypothesis: At least ONE coefficient is non-zero
i.e. β1 ≠ 0 or β2 ≠ 0

Question 17

Q

What is R^2?

Answer

A

R^2 indicates the proportion of variance in the DV that is explained by the IVs

Varies between 0 and 1
i.e. 0% / 100% of the variance explained

Question 18

Q

What is the R^2 in a simple regression?

Answer

A

Simple regression: Only 1 IV & a constant

The R^2 is equal to the square of the correlation coefficient

E.g. Correlation of 0.8 b/w X & Y
R^2 =0.8^2 = 0.64

Question 19

Q

How is R^2 affected when we add a variable?

Answer

A

R^2 will always increase or stay the same

Not possible for R^2 to decrease by adding more variables

If β2=0, it just reduces to the original equation with β1

Question 20

Q

What is adjusted R^2 and when is it used?

Answer

A

We want to keep our model as compact as possible
Adjusted R^2 gives a penalty for additional variable
Adjusted R^2 only increases when the better fit outweighs the cost of having an additional coefficient
Only used when >2 alt. models with diff. no, of variables

Question 21

Q

How do we test for significance of the regression coefficients?

Answer

A

Look at t-test & p-value

t-value = coefficient / standard error
p-value < 0.05

Question 22

Q

What is coefficient interpretation?

Answer

A

ONLY interpret if SIGNIFICANT

When Y increases by 1 unit, Xi changes by β1 units

Question 23

Q

What are fitted values?

Answer

A

We can use our model to calculate fitted values

And compare fitted values to observed values

Question 24

Q

How can fitted values be used?

Answer

A

Used to run scenarios - E.g. Given a price & advertising level, what would sales be?

Forecasting - Predict in the future, other brands, other respondents etc

Question 25

Q

Can we interpret regression coefficients as causal r/s?

Answer

A

No. Regression is based on ASSOCIATION between variables.

For causality, all 3 conditions need to be satisfied:
Needs experimentation and manipulation.