Data Analysis IIa: ANOVA & Regression (Week 6) Flashcards

1
Q

What is ANOVA?

A

To test more than 2 means (i.e. >2 groups)

Do groups 1, 2 & 3 have sig. different means for x̄?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the formula for F-test?

A

F = Between-grp variance/Within-grp variance

= Σnj (x̄j - x̄)^2 / (k-1) / ΣΣ (x- x̄j)^2 / (N-k)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the degrees of freedom for F-test?

A
df1 = k-1
df2 = N-k
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens when we reject the null hypothesis for the F-test?

A

Null hypothesis: All groups have the same mean
Reject Ho -> Not all means are the same.

Which one differs? Conduct post-hoc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an example of post-hoc tests?

A

To find out which means differ from each other

E.g. LSD

Comparable to a large set of t-tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is regression?

A

Calculate the distance from the observation to the fitted line

Regression MINIMISES the sum of these differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is sum of squares?

A

Sum of squares of distances from data pts to fitted line

We prefer the reg. line that gives the LOWEST sum of squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the regression equation?

A

yi = α + β xi + εi

α: intercept/constant
β: slope
ε: disturbance/error term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does α and β affect the graph?

A

Higher α -> Parallel shift of graph (affects INTERCEPT)

Higher β -> Steeper graph (affects SLOPE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is simple vs. multiple regression?

A

Simple: yi = α + β xi + εi

Multiple: yi = α + β1 x1i + β2 x2i + β3 x3i + εi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why do we not do multiple simple regressions?

A

We want to test the effect of multiple variables AT THE SAME TIME

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is omitted variable bias?

A

Eg. Salesi = α + β1 Pricei + β2 Advertisingi + εi

If we omit price, the effect of advertising is not clean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the regression coefficients of the regression equation?

A

α, β1, β2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we choose which IVs to include in our regression equation?

A
  • Use theory/intuition
  • Do not just include all variables in your dataset
  • For exploratory research: use stepwise regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 3 steps to interpret regression results?

A
  1. Model significance: F-test
  2. Model fit: R^2
  3. Regression coefficients: Significance, sign, size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the F-test and the hypotheses?

A

To assess the significance of our model

Null hypothesis: All coefficients are 0
i.e. β1 = 0, β2 = 0

Alt. hypothesis: At least ONE coefficient is non-zero
i.e. β1 ≠ 0 or β2 ≠ 0

17
Q

What is R^2?

A

R^2 indicates the proportion of variance in the DV that is explained by the IVs

Varies between 0 and 1
i.e. 0% / 100% of the variance explained

18
Q

What is the R^2 in a simple regression?

A

Simple regression: Only 1 IV & a constant

The R^2 is equal to the square of the correlation coefficient

E.g. Correlation of 0.8 b/w X & Y
R^2 =0.8^2 = 0.64

19
Q

How is R^2 affected when we add a variable?

A

R^2 will always increase or stay the same

Not possible for R^2 to decrease by adding more variables

If β2=0, it just reduces to the original equation with β1

20
Q

What is adjusted R^2 and when is it used?

A
  • We want to keep our model as compact as possible
  • Adjusted R^2 gives a penalty for additional variable
  • Adjusted R^2 only increases when the better fit outweighs the cost of having an additional coefficient
  • Only used when >2 alt. models with diff. no, of variables
21
Q

How do we test for significance of the regression coefficients?

A

Look at t-test & p-value

t-value = coefficient / standard error
p-value < 0.05

22
Q

What is coefficient interpretation?

A

ONLY interpret if SIGNIFICANT

When Y increases by 1 unit, Xi changes by β1 units

23
Q

What are fitted values?

A

We can use our model to calculate fitted values

And compare fitted values to observed values

24
Q

How can fitted values be used?

A

Used to run scenarios - E.g. Given a price & advertising level, what would sales be?

Forecasting - Predict in the future, other brands, other respondents etc

25
Q

Can we interpret regression coefficients as causal r/s?

A

No. Regression is based on ASSOCIATION between variables.

For causality, all 3 conditions need to be satisfied:
Needs experimentation and manipulation.