Correlation And Mulitple Regression Flashcards
3 types of multiple regression
–simultaneous
–stepwise
–hierarchical
What are correlation and regression for?
study of the relationship between two or more variables
Regression
allows prediction of Y on the basis of knowledge of X
Correlation
measures strength of relationship between X and Y
Scatter plot
–2-D diagram
–1 point for each participant
–coordinates are scores on variables: e.g. (X1,Y) or (X2,Y)
Correlation and scatter plot
–linked to degree to which points cluster around regression line
–value between -1 and +1
Venn diagram
size of circles represent variance of variable
overlapping circles denote correlated variables
what is the relationship between 2 variables once the effect of the other variables has been removed?
measures the strength of dependence between 2 variables that is not accounted for by the way in which they both change in response to variations in a selected subset of the other variables
What is multiple regression for
learn about relationship between several independent variables (predictors) and one dependent variable (criterion)
predictive tool
•examples
–estate agent analyzes selling price: for each house, he records size, number of bedrooms, average income in neighbourhood, subjective appeal, etc.
how do these relate to the selling price?
–psychologist studies depression: for each participant, he records age, gender, stress, measure of neuroticism, etc.
how do these relate to depression?
Assessing goodness to fit
–multiple correlation coefficient
correlation between the criterion Y and the best linear combination of the predictors, Ŷ
–coefficient of determination (R2)
•proportion of variability in data set accounted for by statistical model
•square of multiple correlation coefficient
–F-ratio
improvement in prediction of criterion compared to inaccuracy of model
Multiplied regression
Simulatanous (standard)
–no a priori model
–enter all IVs at once
Multiple regression
Step wise
–no a priori model
–computer chooses, on statistical ground, an a posteriori model (best sub-set of IVs)
–capitalises on chance effects
Multiple regression
Hierarchical (sequential)
–theoretically sound
–a-priori sequence of entry
Factors affecting regression
- outliers & influential points
- homo/hetero-scedasticity
- singularity & multi-collinearity
- number of cases vs number of predictors
- range
- distribution
Outliers and influential points
- points which deviate markedly from others in sample
* Cook’s distance of 1 or greater