Session 6: Correlation and regression Flashcards
Product moment correlation r
a measure of strength of a linear relationship between quantitative variables
- also known as Bravais-Pearson correlation
- calculated more frequently than other coefficients
Positive correlation and negative correlation
positve: higher scores on one varibale tend to be associated with higher scores on the other
negative correlation: higehr scores one one variable tend to be associated with lower scores on the other one
Interpretation of r scores
r= .10 (small relationship) r= .25 (medium relationship) r= .50 (large relatipnship)
Linear regression
Linear regression is a statisctical method that allows modeling linear relationships between a dependent variable and one or more independent variables
Regression equation
- linear regression is about estimating a linear regression equation
- linear regression equations have the same linear structure: Y= b0 + b1X
Errors (in a regression equation)
difference between observed values and the predicted values by our predicted equation (these differences should be as small as possible)
How are the regression coefficients b0 and b1 determined?
The method of least squares finds b0 and b1 by minimizing the totoal squared error between the actual Y values and the predicted Y values
Testing the regression equations (Why?)
after the regression equation has been estimated, it should be checked how well it fits as a model of reality
Testing can be split into two parts (of the regression equation)
- Testing the regression equation - wether and how well the DV is explained by the regression equation (goodness of fit - F-tets)
- Testing the regression coefficients - wether and how well each IV of the regression equation contributes to explaining the DV (t-test)
Decomposition of variance
- we know that the optimal regression equatios is found by minimizing the sum of squared reduals (SSR)
- we could use the sum of squared residuals as a measure of goodness of fit of the regression equation to the observed data (the smaller SSR, the better the fit)
Total sum of squares (STT)=
explained sum of squares (SSE) + residual sum of squares (SSR)
Coefficient of determination
explained sum of squares/ total sum of squares
- measure of goodness of fit of the regresssions equation to the observed data
- the higher the coefficient, the better the it of the regression equation to the observed data
What is the problem with SSR?
SSR varies not only with the goodness of fit, but also with the number and size of the Y values
(smaller SSR - better fit!)