Multiple Regressions - L1 Flashcards

1
Q

What is a correlation?

A

An association or dependency between two independently observed variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What diagram would we use for a correlation?

A

Scatter plot ( each data point is a single subject)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the score associated with the analysis of correlation called?

A

Pearson correlation coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What would the PCC be when x and y are completely independent of eachother?

A

0.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What would the PCC be when x and y are identical?

A

1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What would the PCC be when x and y are completely inverse?

A

-1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What PCC score are we looking for?

A

1.0, the more similar the values of variables x and y are, the greater the covariance!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Partial Correlation?

A

Partial correlation is a measure of the strength and direction of a linear relationship between two continuous variables whilst controlling for the effect of one or more other continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What diagram would we use for a Partial correlation?

A

When information from different variables is overlapping, we can visualise this problem as a Venn diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does each circle represent in a partial correlation venn diagram?

A

Size of circle represents the variance of variable
Overlap of circles represents a correlation between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main difference between multiple linear regression and correlation?

A

it describes the relationship between one or more predictor variables (X,X2, etc) and a single criterion variable (Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the regression equation?

A

Y= ax +b, where a is the slope and b is the y- intercept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is prediction error E?

A

the difference between the actual values Y and the predicted value Y (with a squiggle on top)

E = Y - Y(squiggle)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is our objective with a regression model?

A

Our objective is to find the best fit between the model and the observations, by adjusting the values of B, until the prediction error is minimised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How would you assess the Goodness of fit of a regression model? (all three)

A

Multiple correlation coefficient ®

Coefficient of Determination (R2)

F-ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How would you assess the Goodness of fit of a regression model? (R)

A

Correlation between the predicted values Y(squi) and the observed values (Y)

17
Q

How would you assess the Goodness of fit of a regression model? (R2)

A

The proportion of variance of explained by the regression model
This is simply the square of the multiple correlation coefficient

18
Q

How would you assess the Goodness of fit of a regression model? (F)

A

As for ANOVA, we can derive an F-ratio contrasting the proportion of explained variance with the residual variance, allowing a statistical test

Similar to an ANOVA, the F-ratio reflects the ratio of explained variance against total variance. If this is unlikely to be explained by chance alone (e.g. p<0.05), we speak of a significant effect

19
Q

What do higher F-Ratios show?

A

Higher F-ratios indicate better models:
improved prediction of Y (triangle on top) over Y (dash on top) indicated by MSm

Decreased prediction error, indicated by MSr

20
Q

How is effect size for a multiple regession estimated?

A

By using Cohens F2

21
Q

What is Cohens F2 value for a SMALL effect size?

A

0.02

22
Q

What is Cohens F2 value for a MEDIUM effect size?

A

0.15

23
Q

What is Cohens F2 value for a LARGE effect size?

A

0.35

24
Q

What are the Multiple Regression Approaches?

A

Simultaneous (standard)
Stepwise
Hierarchical

25
Q

What are the 6 Factors that effect Multiple linear regression?

A

Outliers
Scedasticity
Singularity & Multicollinearity
Number of observations / Number of predictors
Range of values
Distribution of values

26
Q

What measures the extremetiy of an outlier?

A

Cook’s distance measures the extremity of an outlier, values over 1 are a cause for concern

27
Q

What is Homoscedasticity?

A

residuals stay relatively constant over the range of the predictor variable

28
Q

What is Heteroscedasticity?

A

residuals vary systematically across the range of the predictor variable

29
Q

How can Heteroscedacity be evaluated?

A

Heteroscedacity can be evaluated using a plot of the predicted value Y (triangle) vs the residual error (Y resid)

30
Q

What is Multicollineraity?

A

refers to a high similarity between two or more variables (r>0,9)

31
Q

What is singularity?

A

refers to a redundant variable, typically, this results when one variable is a combination of two or more other variables (eg subscores of an intelligence scale)

32
Q

What are some issues with multic and singularity?

A

Logical: dont want to measure the same thing twice

Statistical: cannot solve regression problem because the system is ill-conditioned

33
Q

What is Anscombe’s Quartet?

A

This is Range and distribution