[L11] Multiple Regression Analysis Flashcards
__ (rather correlation) is the term used when we have
the specific aim of predicting values on a ___ variable (or
target) from a “
__ variable”.
Regression; criterion; predictor
The square of ___gives us an estimate of
the variance in y explained by variance in x.
correlation coefficient
Because there is a correlation between x and y , we can to a
certain extent, dependent on the size of r², predict __
from x scores.
y scores
line of best fit placed among the
points in a scatterplot.
REGRESSION LINE
REGRESSION LINE
On this line will lie all our
___,
symbolized as ŷ, made from our knowledge of x values.
predicted values for y
The vertical line between an actual y value & its
associated ŷ value is known as
PREDICTION ERROR.
But it is better known as a ____ because it
represents how wrong we are in making the prediction for
that particular case.
RESIDUAL
The
_
__ then is a line that minimizes these
residuals
regression line
__– it is the number of units Ŷ increases
for every unit increase in x.
Regression coefficient
_ is a constant value. It is the value of Ŷ when x is 0
c =
In regression we also deal with___ rather than raw
scores
standard scores
When scores (x and y) are expressed in standard score form
then the regression coefficient is known as the
__
STANDARDIZED REGRESSION COEFFICIENT or
BETA
Where there is only one predictor, __ is in fact the correlation
coefficient of x with y.
beta
_
* Can be used when we have a set of variables (x1, x2, x3 etc)
each of which correlates to some extent with a criterion
variable (y) for which we would like to predict values.
MULTIPLE PREDICTIONS
_ of two variables (green portion is the
shared variance) = r²
Co-variation
Because multiple regression has so much to do with
correlation, it is important that the variables used are
__.
continuous
That is, they need to incorporate measures on some kind of a
___ scale
linear
In __, variables like marital status where codes 1-4 are
given for single, ,married, divorced, widowed etc. can not be
done
correlation
The exception, as with correlation, is the __ variable
which is exhaustive, such as gender.
dichotomous
Even with these variables, however, it does not make sense to
carry out a __ regression analysis if almost all variables
are dichotomously categorical
multiple
if almost all variables
are dichotomously categorical, In this instance a better procedure would be __
LOGISTIC
REGRESSION
__
* Refers to predictor variables that will also correlate
with one another
COLLINEARITY
If one IV is to be a useful predictor of the DV, independently
of its relationship with another IV (collinearity), we need to
know its unique relationship with the dependent variable.
* This is found using the statistic known as the ___
SEMI-PARTIAL
CORRELATION COEFFICIENT
___is a way of partialling out the
effect of a third variable (z) on the correlation between two
variables, x and y.
PARTIAL CORRELATION
In __ correlation we take the residuals of only one of
the two variables involved.
semi-partial
Semi-partial correlation gives us the ___ only
shared between an IV & the DV, with the variance of another
IV partialled out.
common variance
Remember that the explained variance is found by ___
squaring the
regression coefficient.
Now, imagine that for each predictor variable a regression
coefficient is found that, when squared, gives us the unique
variance in the DV explained by that predictor on its own with
the effect of all other predictors partialled out
In this way we can improve our prediction of the variance in a
dependent variable by adding in __ in addition to any variance already explained by
other predictors
predictors that explain
variance in y
In multiple regression, then, a ___ of one
variable is made using the correlations of other known
variables with it.
statistical prediction
For the set of predictor variables used, a particular
combination of __is found that maximizes
the amount of variance in y that can be accounted for.
regression coefficients
In multiple regression, then, there is an ___ that predicts
y, not just from a x as in single predictor regression, but from
the regression coefficients of X1, X2, X3 and so on.. Where
Xs are predictor variables whose correlations with y are
known.
equation
bi are the __
regression coefficients for each of the predictors (xi)
bo is the __ (c in the simple example earlier)
constant
These b values are again the ___increases for
each unit increase in the predictor (xi), if all the other
predictors are held constant.
number of units Ŷ
However, in this multiple predictor model, when standardized
values are used, the ___are
not the same value as that predictor’s correlation with y on its
own.
standardized regression coefficients
What is especially important to understand is that, although a
single predictor variable might have a strong individual
correlation with the criterion variable, acting among a set of
predictors it might have a
__
very low regression coefficient.
In this case the potential contribution of one predictor variable
to explaining variance in DV has, as it were, already been
mostly used up by another predictor or IV with which I shares
a lot of __
common variance
The multiple regression procedure produces a __, symbolized by R, which
is the overall correlation of the predictors with the criterion
variable.
* In fact, it is the simple correlation between actual y values &
their estimated y values.
MULTIPLE
CORRELATION COEFFICIENT
The higher R is, the _
_
between actual y value &
estimated ŷ.
better is the fit
The closer R approaches to +1, the ___
the differences between actual & estimated value
smaller are the residuals –
Although R behaves just like any other correlation coefficient,
we are mostly interested in R² since this gives us the
__ in the criterion variable that has been
accounted for by the predictors taken together.
proportion of variance
This is overall what we set out to do – to find the ___ to account for variance in the
criterion variable.
best
combination of predictors
To find an R that is significant is __
no big deal
This is the same point about __ correlations, that their
strength is usually of greater importance than their
significance, which can be misleading.
single
However, to check for significance the R² Value can be
converted into an __
F value
R² has to be adjusted because with small N its value is
artificially __.
high
This is because, at the extreme, with N = number of predictor
variables (p) + 1, prediction of the criterion variable values is
__ and R² = 1, even though, in the population, prediction
can not be that perfect.
perfect
The issue boils down to one of __
sampling adequacy.
Various rules of thumb are given for the __to produce a meaningful estimate of the relationship
between predictors & criterion in the population – remember
we are still estimating population parameters from samples.
minimum number of
cases (N)
Although some authors recommend very high N indeed, Some
recommend that the minimum should be __, and most
accept this as reasonable, though the more general rule is __
p + 50; “as
many as possible”.
Effect Size
Small = .02
* Medium = .15
* Large = .35
The output table in Multiple Regression Analysis\
- 1st table - simple descriptives for each variable.
- 2nd table - correlation between all variables.
- 3rd - which variables have been entered into the equation
- Model Summary –
gives R, R², and adjusted R²
Tells whether or not the model accounts for a
significant proportion of the variance in the criterion variable.
It is a comparison of the variance “explained” vs. the variance
“unexplained” (the residuals)
- ANOVA –
- – contains information about all the individual
predictors
Coefficients
__ – are the b weights. This tells us
how many unit/point increase will there be in the criterion
variable for every 1 unit/point increase in a particular predictor
variable.
Unstandardized Coefficients
__- are the beta values. This tells us
how many SD unit increase will there be in the criterion
variable for every 1 SD unit increase in a particular predictor
variable.
Standardized Coefficients
__– found by dividing the unstandardized b value by its
standard error. If t is significant, it means the predictor is
making a significant contribution to the prediction of the
criterion.
t-values
__ – when predictor variables correlate together too
closely. If tolerance values in the coefficients table are very
low (e.g., under .2) then multicollinearity is present.
Collinearity