Correlation And Mulitple Regression Flashcards
3 types of multiple regression
–simultaneous
–stepwise
–hierarchical
What are correlation and regression for?
study of the relationship between two or more variables
Regression
allows prediction of Y on the basis of knowledge of X
Correlation
measures strength of relationship between X and Y
Scatter plot
–2-D diagram
–1 point for each participant
–coordinates are scores on variables: e.g. (X1,Y) or (X2,Y)
Correlation and scatter plot
–linked to degree to which points cluster around regression line
–value between -1 and +1
Venn diagram
size of circles represent variance of variable
overlapping circles denote correlated variables
what is the relationship between 2 variables once the effect of the other variables has been removed?
measures the strength of dependence between 2 variables that is not accounted for by the way in which they both change in response to variations in a selected subset of the other variables
What is multiple regression for
learn about relationship between several independent variables (predictors) and one dependent variable (criterion)
predictive tool
•examples
–estate agent analyzes selling price: for each house, he records size, number of bedrooms, average income in neighbourhood, subjective appeal, etc.
how do these relate to the selling price?
–psychologist studies depression: for each participant, he records age, gender, stress, measure of neuroticism, etc.
how do these relate to depression?
Assessing goodness to fit
–multiple correlation coefficient
correlation between the criterion Y and the best linear combination of the predictors, Ŷ
–coefficient of determination (R2)
•proportion of variability in data set accounted for by statistical model
•square of multiple correlation coefficient
–F-ratio
improvement in prediction of criterion compared to inaccuracy of model
Multiplied regression
Simulatanous (standard)
–no a priori model
–enter all IVs at once
Multiple regression
Step wise
–no a priori model
–computer chooses, on statistical ground, an a posteriori model (best sub-set of IVs)
–capitalises on chance effects
Multiple regression
Hierarchical (sequential)
–theoretically sound
–a-priori sequence of entry
Factors affecting regression
- outliers & influential points
- homo/hetero-scedasticity
- singularity & multi-collinearity
- number of cases vs number of predictors
- range
- distribution
Outliers and influential points
- points which deviate markedly from others in sample
* Cook’s distance of 1 or greater
homoscedasticity vs heteroscedasticity
•homoscedasticity
variability of scores (errors) in one continuous variable same in second variable
uniform scatter or dispersion of data points about the regression line.
(homogeneity of variance)
•heteroscedasticity
one variable is skewed or the relationship is non-linear
Singularity and multi collinearity
•singularity
redundancy, one variable combination of 2 or more other variables
•multi-collinearity
variables are highly correlated ( > 0.90)
Singularity and multi collinearity problems and solutions
•problems
–logical: don’t want to measure the same thing twice.
–statistical: singularity prevents matrix inversion (division) as determinants = zero
•screening & solutions
–high bivariate correlations (> 0.9)
compute correlations amongst IVs, remove appropriate IV
–high multivariate correlations
examine SMC (squared multiple correlation) of each IV w.r.t others
(tolerance = 1 – SMC)
A small range…
Restricts power of tests
Anacombes quartet:
same mean, variance, correlation, regression line
What technique is used for two sets of independent variables?
What is common between the sets?
Canonical correlation
What technique is used for many independent variables, when asking the question, what is relationship between 2 variables once effect of others removed?
Partial correlation
One dependant variable, technique used
Predicting IV from DV’s
Multiple regression
Relationship between IV and DV, technique used:
Multiple correlation