week 4 Flashcards
regression
extends upon correlation to examine whether we can estimate the values of an outcome variable on the basis of our variables
Multiple regression
uses multiple predictor variables
Factorial ANOVA
focuses on differences in scores on the dependent variables, according to two or more independent variables
Types of multiple regression
Forced entry
Hierarchical multiple regression
Stepwise multiple regression
Forced entry
Predictors based on previous research and theory
Do not state a particular order of the variables to be entered
All variables are forced into the model at the same time
Hierarchical regression
Predictors based on previous research.
Researcher decides the order in which the predictors are entered into the model.
Enter known predictors (based on previous research) first and then enter new predictors (new/more exploratory hypotheses).
New predictors can be entered:
All at once (like in the enter method).
In a hierarchical manner.
In a stepwise manner.
Stepwise methods
The most controversial method (for psychologists) as the order the variables are entered is based on maths rather than previous research/ theory.
Both forward and backward methods.
Computer programme selects the predictor that best predicts the outcome and enters that into the model first (in forward methods).
Parts of a regression
The regression line (the model)
The line of best fit
Identify how well the model (the regression line) represents the data
Is it significant?
assess this using an ANOVA
How much variance is accounted for by the model (effect size)
R2 value
Examine the relationship between predictor and outcome
The intercepts (the value of y when x = 0)
Betas (standardised and unstandardised, how does Y change in relation to a change in X).
Sample size
The old rule –
For every one predictor variable need 10 participants
Rule of thumb no empirical evidence to support this
More is better
Depends on the size of effect you want to find
Field (2010) suggests you use the following equations to identify an appropriate size:
Equation 1: 50 + 8k where k = the number of predictor variables
Equation 2: 104 + k
Multicollinearity
Strong correlation between predictor variables.
Perfect collinearity when you have a correlation of 1 between predictors.
Becomes difficult to interpret the results:
Difficult to identify the predictive value of each individual predictor variable.
Untrustworthy b’s
The beta values give an indication of change in the outcome for every unit change in the predictor.
If the individual predictors are correlated, the betas will be unreliable.
Importance of predictors
Similarly, can’t identify the individual importance of each predictor.
Limits the size of R2
Difficult to identify the proportion of variance accounted for by a particular variable.
Threatens the validity of the model produced.
Identifying Multicollinearity
Collinearity statistics
VIF (Variance Inflation Factor)
If the average VIF is substantially greater than 1 then regression may be biased.
If largest VIF is greater than 10 there is definitely a problem.
Tolerance
If tolerance is below 0.1 a serious problem.
If tolerance is below 0.2 a potential problem.
To understand homoscedasticity, you need to understand residuals….
When we draw a regression line, there will be differences between the data points and the line
The distances between the line and the individual data points are the RESIDUALS
Some of the data points are above the line
The line underestimates the value of Y
Some of the data points are below the line
The line overestimates the value of Y
Homoscedasticity
At each level of the predictor variable, the variance of the residuals should be constant.
Not the actual residual value, but the variance in the residual values
This is what is meant by homoscedasticity.
If the variance of the residuals are different, we have heteroscedasticity.
Independent Errors
For any two observations (data points) the residual points should not correlate, they should be independent
To identify whether this is an issue in your analysis, use the Durbin-Watson Test.
This tests correlations across error terms
Durbin-Watson test
Tests whether residuals next to each other are correlated
Test Statistic varies between 0 and 4
Value of 2 means the residuals are uncorrelated
A value greater than 2 indicates a positive correlation
A value lower than 2 indicates a negative correlation
Values greater than 3 and less than 1 indicate a definite problem.
Values close to 2 suggest there is no issue.
Normally distributed errors
Often confused with normally distributed data for predictors
That’s not what this means.
Technically means, that the residual values in the regression model are random and normally distributed, with a mean of 0
So in other words, there is an even chance of points lying above and below the best-fit line