[L11] Multiple Regression Analysis Flashcards by Arellano, Miella Janica

__ (rather correlation) is the term used when we have
the specific aim of predicting values on a ___ variable (or
target) from a “
__ variable”.

Regression; criterion; predictor

How well did you know this?

Not at all

Perfectly

The square of ___gives us an estimate of
the variance in y explained by variance in x.

correlation coefficient

How well did you know this?

Not at all

Perfectly

Because there is a correlation between x and y , we can to a
certain extent, dependent on the size of r², predict __
from x scores.

y scores

How well did you know this?

Not at all

Perfectly

line of best fit placed among the
points in a scatterplot.

REGRESSION LINE

How well did you know this?

Not at all

Perfectly

REGRESSION LINE

On this line will lie all our
___,
symbolized as ŷ, made from our knowledge of x values.

predicted values for y

How well did you know this?

Not at all

Perfectly

The vertical line between an actual y value & its
associated ŷ value is known as

PREDICTION ERROR.

How well did you know this?

Not at all

Perfectly

But it is better known as a ____ because it
represents how wrong we are in making the prediction for
that particular case.

RESIDUAL

How well did you know this?

Not at all

Perfectly

The
_
__ then is a line that minimizes these
residuals

regression line

How well did you know this?

Not at all

Perfectly

__– it is the number of units Ŷ increases
for every unit increase in x.

Regression coefficient

How well did you know this?

Not at all

Perfectly

_ is a constant value. It is the value of Ŷ when x is 0

c =

How well did you know this?

Not at all

Perfectly

In regression we also deal with___ rather than raw
scores

standard scores

How well did you know this?

Not at all

Perfectly

When scores (x and y) are expressed in standard score form
then the regression coefficient is known as the
__

STANDARDIZED REGRESSION COEFFICIENT or
BETA

How well did you know this?

Not at all

Perfectly

Where there is only one predictor, __ is in fact the correlation
coefficient of x with y.

beta

How well did you know this?

Not at all

Perfectly

_
* Can be used when we have a set of variables (x1, x2, x3 etc)
each of which correlates to some extent with a criterion
variable (y) for which we would like to predict values.

MULTIPLE PREDICTIONS

How well did you know this?

Not at all

Perfectly

_ of two variables (green portion is the
shared variance) = r²

Co-variation

How well did you know this?

Not at all

Perfectly

Because multiple regression has so much to do with
correlation, it is important that the variables used are
__.

continuous

How well did you know this?

Not at all

Perfectly

That is, they need to incorporate measures on some kind of a
___ scale

linear

How well did you know this?

Not at all

Perfectly

In __, variables like marital status where codes 1-4 are
given for single, ,married, divorced, widowed etc. can not be
done

correlation

How well did you know this?

Not at all

Perfectly

The exception, as with correlation, is the __ variable
which is exhaustive, such as gender.

dichotomous

How well did you know this?

Not at all

Perfectly

Even with these variables, however, it does not make sense to
carry out a __ regression analysis if almost all variables
are dichotomously categorical

multiple

How well did you know this?

Not at all

Perfectly

if almost all variables
are dichotomously categorical, In this instance a better procedure would be __

LOGISTIC
REGRESSION

How well did you know this?

Not at all

Perfectly

__
* Refers to predictor variables that will also correlate
with one another

COLLINEARITY

How well did you know this?

Not at all

Perfectly

If one IV is to be a useful predictor of the DV, independently
of its relationship with another IV (collinearity), we need to
know its unique relationship with the dependent variable.
* This is found using the statistic known as the ___

SEMI-PARTIAL
CORRELATION COEFFICIENT

How well did you know this?

Not at all

Perfectly

___is a way of partialling out the
effect of a third variable (z) on the correlation between two
variables, x and y.

Study These Flashcards

PARTIAL CORRELATION

In __ correlation we take the residuals of only one of the two variables involved.

semi-partial

Semi-partial correlation gives us the ___ only shared between an IV & the DV, with the variance of another IV partialled out.

common variance

Remember that the explained variance is found by ___

squaring the regression coefficient.

Now, imagine that for each predictor variable a regression coefficient is found that, when squared, gives us the unique variance in the DV explained by that predictor on its own with the effect of all other predictors partialled out In this way we can improve our prediction of the variance in a dependent variable by adding in __ in addition to any variance already explained by other predictors

predictors that explain variance in y

In multiple regression, then, a ___ of one variable is made using the correlations of other known variables with it.

statistical prediction

For the set of predictor variables used, a particular combination of __is found that maximizes the amount of variance in y that can be accounted for.

regression coefficients

In multiple regression, then, there is an ___ that predicts y, not just from a x as in single predictor regression, but from the regression coefficients of X1, X2, X3 and so on.. Where Xs are predictor variables whose correlations with y are known.

equation

bi are the __

regression coefficients for each of the predictors (xi)

bo is the __ (c in the simple example earlier)

constant

These b values are again the ___increases for each unit increase in the predictor (xi), if all the other predictors are held constant.

number of units Ŷ

However, in this multiple predictor model, when standardized values are used, the ___are not the same value as that predictor’s correlation with y on its own.

standardized regression coefficients

What is especially important to understand is that, although a single predictor variable might have a strong individual correlation with the criterion variable, acting among a set of predictors it might have a __

very low regression coefficient.

In this case the potential contribution of one predictor variable to explaining variance in DV has, as it were, already been mostly used up by another predictor or IV with which I shares a lot of __

common variance

The multiple regression procedure produces a __, symbolized by R, which is the overall correlation of the predictors with the criterion variable. * In fact, it is the simple correlation between actual y values & their estimated y values.

MULTIPLE CORRELATION COEFFICIENT

The higher R is, the _ _ between actual y value & estimated ŷ.

better is the fit

The closer R approaches to +1, the ___ the differences between actual & estimated value

smaller are the residuals –

Although R behaves just like any other correlation coefficient, we are mostly interested in R² since this gives us the __ in the criterion variable that has been accounted for by the predictors taken together.

proportion of variance

This is overall what we set out to do – to find the ___ to account for variance in the criterion variable.

best combination of predictors

To find an R that is significant is __

no big deal

This is the same point about __ correlations, that their strength is usually of greater importance than their significance, which can be misleading.

single

However, to check for significance the R² Value can be converted into an __

F value

R² has to be adjusted because with small N its value is artificially __.

high

This is because, at the extreme, with N = number of predictor variables (p) + 1, prediction of the criterion variable values is __ and R² = 1, even though, in the population, prediction can not be that perfect.

perfect

The issue boils down to one of __

sampling adequacy.

Various rules of thumb are given for the __to produce a meaningful estimate of the relationship between predictors & criterion in the population – remember we are still estimating population parameters from samples.

minimum number of cases (N)

Although some authors recommend very high N indeed, Some recommend that the minimum should be __, and most accept this as reasonable, though the more general rule is __

p + 50; “as many as possible”.

Effect Size

Small = .02 * Medium = .15 * Large = .35

The output table in Multiple Regression Analysis\

* 1st table - simple descriptives for each variable. * 2nd table - correlation between all variables. * 3rd - which variables have been entered into the equation

* Model Summary –

gives R, R², and adjusted R²

Tells whether or not the model accounts for a significant proportion of the variance in the criterion variable. It is a comparison of the variance “explained” vs. the variance “unexplained” (the residuals)

* ANOVA –

- – contains information about all the individual predictors

Coefficients

__ – are the b weights. This tells us how many unit/point increase will there be in the criterion variable for every 1 unit/point increase in a particular predictor variable.

Unstandardized Coefficients

__- are the beta values. This tells us how many SD unit increase will there be in the criterion variable for every 1 SD unit increase in a particular predictor variable.

Standardized Coefficients

__– found by dividing the unstandardized b value by its standard error. If t is significant, it means the predictor is making a significant contribution to the prediction of the criterion.

t-values

__ – when predictor variables correlate together too closely. If tolerance values in the coefficients table are very low (e.g., under .2) then multicollinearity is present.

Collinearity

[L11] Multiple Regression Analysis Flashcards

(59 cards)