correlation & multiple regression Flashcards

Question 1

Q

What is correlation

Answer

A

An association or dependency between two independently observed variables

Question 2

Q

Analysis of correlation and what scores mean

Answer

A

0.0 when X and Y are completely independent of each other

1.0 when they are identical to one another

−1.0 when they are exactly inverse to one another

Question 3

Q

What is partial correlation?

Answer

A

Want to see if more than 2 variables relate to one another

i.e X, Y and Z

Question 4

Q

What is multiple linear regression?

Answer

A

Multiple linear regression is a similar concept to correlation

Major difference: it describes the relationship between one or more predictor variables (X1, X2, etc.) and a single criterion variable (Y)

Question 5

Q

Higher the beta… (MR)

Answer

A

Stronger the relationship

Question 6

Q

Beta tells us… (MR)

Answer

A

how e.g neurotism/stress predicts depression

Question 7

Q

prediction error is…

Answer

A

difference between the actual Y values and the predicted values

we aim to get this minimised

can be expressed as residual sum of squares

Question 8

Q

y = ax + b is the same as..

Answer

A

Y = BETA0 + BETA1X1

Question 9

Q

Multiple correlation coefficient (R)

Answer

A

Correlation between the predicted values Y^ and the observed values Y

Question 10

Q

Coefficient of determination (R^2)

Answer

A

Proportion of variance of explained by the regression model
This is simply the square of the multiple correlation coefficient

Question 11

Q

F-Ratio

Answer

A

As for ANOVA, we can derive an F-ratio contrasting the proportion of explained variance with the residual variance, allowing a statistical test

Question 12

Q

Assessing goodness-of-fit: sums of squares

Answer

A

Total sums of squares - how far all the data points vary from the mean

Residual sums of squares - difference between actual value and predicted value

Ssm = how much does our model vary from the mean - model sums of squares - mean best guess

Question 13

Q

Equation for coefficient of determination (R2)

Answer

A

R2 = SSM / SST
OR
R2 = 1 - SSR / SST

Question 14

Q

Higher F-rations indicate ?

Answer

A

Better models

Question 15

Q

Effect size for MR

Answer

A

Cohen’s f2
small = 0.02
medium = 0.15
large = 0.35

Question 16

Q

Multiple regression approaches

Answer

A

Simultaneous
Stepwise
Hierarchial

Question 17

Q

Simultaneous (standard) approach

Answer

A

No a priori model assumed
All predictor variables are fit together

Question 18

Q

Stepwise approach

Answer

A

No a priori model
Predictor variables are added/removed one at time, to maximize fit
Not a good approach because it will always overfit the data

Question 19

Q

Hierarchical approach

Answer

A

Based on a priori knowledge of variables – we may know a relationship exists for some variables, but are interested in the added explanatory power of a new variable

Several subsequent regression models are analysed (adding or removing predictor variables)

We can use this assess how much better one model explains the criterion variable than another (∆R^2) = larger = stress scroe predicts the depression scores over neuroritism - how well we can predict stress score > neurortism

Question 20

Q

Factors affecting multiple linear regression

Answer

A

Outliers
Scedasticity - how much one variability looks like
Singularity & Multicollinearity
Number of observations / Number of predictors
Range of values - how much variability is there
Distribution of values - normal…

Question 21

Q

Scedasticity

Answer

A

Scedasticity refers to the distribution of the residual error (i.e., relative to the predictor variable)
- Homoscedasticity: residuals stay relatively constant over the range of the predictor variable
- Heteroscedasticity: residuals vary systematically across the range of the predictor variable
Multiple linear regression assumes homoscedasticity

Question 22

Q

Singularity and multicollinearity

Answer

A

Multicollinearity refers to a high similarity between two or more variables (r > 0.9)
Singularity refers to a redundant variable; typically, this results when one variable is a combination of two or more other variables (e.g., subscores of an intelligence scale)
Problems with these:
- Logical: Don’t want to measure the same thing twice
- Statistical: Cannot solve regression problem because system is ill-conditioned

Question 23

Q

Number of observation, number of predictors

Answer

A

Number of observations (N) should be high compared to the number of predictor variables (m)
- Results become meaningless (impossible to generalise due to overfitting) as N/m decreases
Rules of thumb (medium effect size):
- N > 50 + 8 x m

Brainscape's Knowledge GenomeTM

correlation & multiple regression Flashcards

Brainscape's Knowledge Genome^TM