Week 7 Ch. 19 Bivariate Regression Hills Flashcards
What is the outcome variable?
This is the variable that we want to predict - the outcome or criterion.
E.g. Knowledge of environmental issues. The DV… Not a strictly correct term.
The variable May be age… The predictor.
What is prediction based on in multiple regression?
Prediction is based on creating the best line of fit or regression line…a line that is as close as possible to all the points on a scatter plot.
Scatter plot axis
The predictor is plotted on …. Axis.
The predictor is plotted on the X axis (horizontal). Eg age
Scatter plot axis
The criterion or DV is plotted on the ….. Axis.
The criterion is plotted on the vertical axis or Y axis.
E.g. Knowledge.
The line of best fit indicates what?
The line of best fit - regression line - is the line on the scatter plot that is closest on average to all observation points. It is the line that allows the best possible prediction of Y scores from knowledge of X scores.
Perfect correlation - how does it look on the scatter plot?
A perfect correlation of +/-1 has all of the ,points falling on the regression line.
The smaller the correlation, the more inaccurate the prediction.
P.249 Hills
How do we make a prediction using the regression line?
A line is drawn perpendicular to the x-axis.
At the point where this line meets the regression line, another line is drawn perpendicular to the y axis to give the best possible prediction (Y hat thingy) of the person’s knowledge score.
P.249 Hills.
Residuals…. what are they?
Another annoying term for a basic concept.
Residuals are errors in prediction.
They are the the difference between actual Y scores and the predicted Y scores….
(Y - Y hat).
Y hat is the Error of prediction where the intercept of lines from x across to regression line and y … You know what I mean. :)
What criterion is used when calculating the regression line?
When calculating the regression line, the LEAST SQUARES CRITERION is used.
If all the residuals are added, they will sum to ‘0’ because there is ‘as much’ above the regression line as there is below.
The least squares regression line is calculated so that it minimises the sum of the squared residuals.
P.249
The equation for a regression line.
The equation for the line of best fit:
Y(hat) = a + bX
a is the intercept (constant)
b b-weight (or regression coefficient) is the slope of the line (the amount by which Y increases for every 1 unit increase in X)
Linear regression is a technique for calculating the values a and b for the least squares regression line.
P.250
BETA
Note that Beta in the SPSS table is the ‘standardised b weight’,
calculated using the standard, not raw scores!
With the regression line passing through the origin (0 on the y-axis).
SPSS table headed COEFFICIENTS
The column for “unstandardised coefficients std error’
Gives the standard error of the ‘b-weight’ and ‘constant’ respectively.
ASSUMPTIONS of Bivariate Regression (Linear Regression)
Similar assumptions to correlation analysis: # relationship between variables must be linear # distribution should be equal across the range of X scores... Homoscedastic NOT Heteroscedastic. # there should be no restricted range on one or both variables (see p.237 Hills) # outliers are a serious problem as they distort the correlations... Usually need to be deleted but need to report these. # be aware of extreme groups or combining of groups with different means # participants should be randomly sampled (p.238) and independent of one another.
The STANDARD ERROR of the ESTIMATE (Standard error of prediction)
The standard error of the estimate, which is the final figure given in the SPSS Model Summary table, is similar to standard deviation in univariate distributions, and corresponds to the average amount of error in predicted Y scores.
When normally distributed we can conclude that 68% of actual Y scores will be within one standard error (i.e. +/- 4.31 points) of the predicted Y (hat) score.
P.252 Hills
The higher the correlation, the smaller the standard error of prediction.
Percentage of Variance
Scores on any variable vary about the mean.
When two variables are correlated, we can EXPLAIN or ACCOUNT for or MORE CORRECTLY predict part of the variance in one from knowledge of the other and vice-versa.
P.253 Hills.