L5 - Inferences from Predictions & Regressions Flashcards

1
Q

When presented with 2 or more IVs to a prediction, what are the two possible ways to approach the analysis?

A
  • Simple regression

- Multiple regression!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why would multiple regression be preferred over simple regression when there are multiple IVs?

A

By doing the regression as a multiple regression, the method of least squares partials out any joint overlapping effects on the DV among the set of IVs.

(Can be a problem when the IVs correlate with each other)/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What were some things that went wrong when performing a simple regression for multiple IVs?

A
  • R Sqaured values, when added together, were very high compared to the single R Squared value obtained from multiple regression
  • Regression coefficient (R) values were not the same for each IV in multiple regression, when compared to separate simple regression coeffecients.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does multiple regression adjust the predictive effect of IVs to not have any overlap?

A

Method of least squares adjusts predictive effect of each IV in multiple regression due to CORRELATIONAL OVERLAP with over IVS by PARTIALLING IT OUT.

Each partial regression coefficient indicates the optimal strength of prediction for each IV, CONTROLLING for effects of all over IVs in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two ways we can make an inference from linear regression from?

A
  • Null Hypothesis testing

- Confidence intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the corresponding population parameter to R Squared?

A

P ^2

(Rho) squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the null hypothesis test for R Squared use?

A

ANOVA table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is mean square calculated?

A

MS = (the relevant) SS/ df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can be found in the ANOVA table?

A
  • SS for SS reg, SS res, SS tot
  • Df for reg, res, tot
  • MS for reg, res, tot
  • F test statistic (Tobs)
  • p value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can be found in model summary statistics table?

A
  • R (Multiple R)
  • R square
  • Adjusted R square
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

On what distribution is the F statistic modelled on, from the ANOVA table?

A

F theoretical probability distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the formula for F statistic (Tobs) in ANOVA

A

F = MS reg/ MS res

by looking at this, we can tell that F statistic is just a ratio for mean sums of regression and mean sums of residuals… hence it’s the AVERAGE AMOUNT OF VARIATION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the shape of an F distribution defined by?

A

2 parameters: numerator DF (df reg), denominator DF (df res).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the precision of a CI on R squared depend on?

A
  • size of sample: bigger sample = more precise
  • number of IVs: less IVs = more precise
  • size of observed R square: larger R square = more precision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we obtain a CI for Multiple R?

A

By square rooting the upper and lower onds of the CI for R square, as R = square root of R square.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interpret a 95% CI of [0.291, 0.789] for R Square

A

We are 95% confident that between 29% to 79% of the variance in the dependent variable at the population level is explained by the independent variables.

17
Q

What does it indicate if the observed R square value is actually outside the bounds of the Confidence Interval?

A

This indicates extreme bias in the observed R squared value, and may be due to a very small sample size and too many IVs.

18
Q

is observed R square biased?

A

Yes, it’s biased, but consistent.

So with greater sample size (and less Iv’s) it can be more accurate.

19
Q

What distribution does the unstandardised regression coefficient divided by its standard error follow (Tobs)?

A

t distribution.

like pearson correlations

20
Q

In a NHST of coefficients, what does the df for observed t statistic equal???

A

df = n - dfReg - 1.

where dfreg = number of IVS

so basically

df= n - IVs - 1

21
Q

How do we calculate the T obs for partial regression coefficient?

A

T obs = (observed regression coeff - null hyp value (zero))/standard error for regression coeff

22
Q

What can be found in a regression coefficients table?

A

Unstandardised coeffcient and it’s standard error.

standardised coefficients

t statistic, and it’s p value

and 95% CI on the unstandardised partial regression coefficient

23
Q

What are the assumptions of Linear Regression?

A
  • residuals are NORMALLY DISTRIBUTED (no implication that scores on IV are normally distributed) - usually things can still work out unless extreme non-normality.
  • variance of residuals is constant for any value of IV - HOMOSCEDASTICITY.
  • scores on all variables are INDEPDANTLY and INDENTICALLY observed
  • a LINEAR RELATIONSIP between DV and each IV
  • variables are measured without error (hard to avoid in psychology).
24
Q

What are some consequences of failure to meet assumptions in linear regression?

A

affects:

  • accuracy for F test for R square
  • accuracy of standard errors for regression coefficients
  • coverage of CI’s for R square and regression coefficients.
  • unbiasedness of various sample statistics provided by analysis.
25
Q

How do we assess normality of residuals?

A

Assess Q-Q plot or histogram of the distribution of residual values for each person.

However, linear regression is robust to mild departures to normality.

26
Q

How do we asses homoscedasticity of residuals?

A

By creating a scatterplot, of standardised residuals vs standardised predicted scores on the dv. THEN ASSESS IF THERE IS ANY PATTERN AMONG DATA POINTS.

standardised to identify outliers more easily!

(standardised Predicted scores on x axis)

residual scores on y axis - use studentised deleted residuals. This means SD = 1

A residual outlier typically has a studentised deleted residual of 3 in medium to large samples, or 2.5 in smaller samples.

27
Q

What is homoscedasticity?

A

means that the variance of the residual values are the same either for any predicted score on the DV, or for any chosen score on the IV.

opposite is heteroscedasticity (on a continuum)

28
Q

What is heteroscedasticity

A

When there is systematic variability in the residual variances according to predicted values of the DV

29
Q

How is a residual outlier identified?

A

A residual outlier typically has a studentised deleted residual of 3 in medium to large samples, or 2.5 in smaller samples.

30
Q

What assumption is influential cases related to?

A

both non-normality and heteroscedasticity.

31
Q

How do we test for influential cases?

A

either:
1. Examine studentised deleted residuals for large absolute values
2. Cook’s d statistic –> If its value is 1 or more, shows excessive influence.

32
Q

What is Cooks d Statistic?

A

This measures the extent to which each case’s data values on the regression variables change the model parameters when that case is removed from the analysis, compared to when it is included.

minimum value = 0
large value (1+) = excessive influence
33
Q

How do we assess non-linearity of residuals?

A

Scatterplots: residuals vs predicted values.

We want to see a random scattering of data points, without any obvious pattern

(actually the same scatterplot as used for testing homoscedasticity).