Week 4: Multiple Regression Flashcards

Question 1

Q

What is the decision tree for multiple regression? - (4)

Answer

A

Continous
Two or more predictors that are continous
Multiple regression
Meets assumptions of parametric tests

Question 2

Q

simple linear regression
the outcome variable Y is

Answer

A

predicted using the equation of a straight line

Question 3

Q

Multiple regression still uses the same basic equation of …. but the model is still complex

Question 4

Q

Multiple regression is the same as simple linear regression expect for - (2)

Answer

A

every extra predictor you include, you have to add a coefficient;

so, each predictor variable has its own coefficient, and the outcome variable is predicted from a combination of all the variables multiplied by their respective coefficients plus a residual term

Question 5

Q

Multiple regression equation

Question 6

Q

In multiple regression equation, list all the terms - (5)

Answer

A

Y is the outcome variable,
b1 is the coefficient of the first predictor (X1),
b2 is the coefficient of the second predictor (X2),
bn is the coefficient of the nth predictor (Xn),
εi is the difference between the predicted and the observed value of Y for the ith participant.

Question 7

Q

Multiple regression uses the same principle as linear regression in a way that

Answer

A

we seek to find the linear combination of predictors that correlate maximally with the outcome variable.

Question 8

Q

Regression is a way of predicting things that you have not measured by predicting

Answer

A

an outcome variable from one or more predictor variables

Question 9

Q

Regression can be used to produce a

Answer

A

linear model of the relationship between 2 variables

Question 10

Q

Record company interested in creating model of predicting recording sales from advertising budget and plays on radio per week (airplay)

Example of it’s MR plotted on + number of vars measured, what vertical axis shows, horizontal and third axis shows - (4)

Answer

A

It is a three dimensional scatter plots, which means there are three axes measuring the value of the three variables.

The vertical axis measures the outcome, which in this case is the number of album sales.

The horizontal axis measures how often the album is played on the radio per week.

The third axis, which can can think of being directed into the page measures the advertising budget.

Question 11

Q

Can’t plot a 3D plot of MR as shown here

Answer

A

for more than 2 predictor (X) variables

Question 12

Q

The overlap in the diagram is the shared variance, which we call the

Answer

A

covariance

Question 13

Q

covariance is also referred to as the variance

Answer

A

shared between the predictor and outcome variable.

Question 14

Q

What is shown in E?

Answer

A

The variance in Album Sales not shared by the predictors

Question 15

Q

What is shown in D?

Answer

A

Unique variance shared between Ad Budget and Plays

Question 16

Q

What is shown in C?

Answer

A

The variance in Album Sales shared by Ad Budget and Plays

Question 17

Q

What is shown in B?

Answer

A

Unique variance shared between Plays and Album Sales

Question 18

Q

What is shown in A?

Answer

A

Unique variance shared between Ad Budget and Album Sales

Question 19

Q

If you got two prediictors thart overlap and correlate a lot then it is a .. model

Answer

A

bad model can’t uniquely explain the outcome

Question 20

Q

In Hierarchical regression, we are seeing whether

Answer

A

one model explains significantly more variance than the other

Question 21

Q

In hierarchical regression predictors are selected based on

Answer

A

past work and the experimenter
decides in which order to enter the predictors into the model

Question 22

Q

As a general rule for hierarchical regression, - (3)

Answer

A

known predictors (from other research) should be entered into the model first in order of their importance in predicting the outcome.

After known predictors have been entered, the
experimenter can add any new predictors into the model.

New predictors can be entered either all in one go, in a stepwise manner, or hierarchically (such that the new predictor
suspected to be the most important is entered first).

Question 23

Q

Example of hierarchical regression in terms of album sales - (2)

Answer

A

The first model allows all the shared variance between Ad budget and Album sales to be accounted for.

The second model then only has the option to explain more variance by the unique contribution from the added predictor Plays on the radio.

Question 24

Q

What is forced entry MR?

Answer

A

method in which all predictors are forced
into the model simultaneously.

Question 25

Q

Like HR, forced entry MR relies on

Answer

A

good theoretical reasons for including the chosen predictors,

Question 26

Q

Different from HR, forced entry MR

Answer

A

makes no decision about the order in which variables are entered.

Question 27

Q

Some researchers believe that about forced entry MR that

Answer

A

this method is the only appropriate method for theory testing because stepwise techniques are influenced by random variation in the data and so rarely give replicable results if the model is retested.

Question 28

Q

How to do forced entry MR in SPSS? - (4)

Answer

A

Analyse –> Linear –> Regression
Put outcome in DV and IVs (predictors, x) in IV box
Can select a range of statistics in statistics box and press okay to check colinearity assumption
Can also click plots to check assumptions of homoscedasticity and lineartiy

Question 29

Q

Why select colinearity diagnostics in statistics box for multiple regression? - (2)

Answer

A

This option is for obtaining collinearity statistics such as the
VIF, tolerance,

Checking assumption of no multicolinearity

Question 30

Q

Multicollinearity exists when there is a

Answer

A

strong correlation between two or more predictors in a regression model.

Question 31

Q

Multicollinearity poses a problem only for multiple regression because

Answer

A

simple regression requires only one predictor.

Question 32

Q

Perfect collinearity exists in multiple regression when at least

Answer

A

e.g., two predictors are perfectly correlated , have a correlation coefficient of 1

Question 33

Q

If there is perfect collinearity in multiple regression between predictors it
becomes impossible

Answer

A

to obtain unique estimates of the regression coefficients because there are an infinite number of combinations of coefficients that would work equally well.

Question 34

Q

Good news is perfect colinearity in multiple regression is rare in

Answer

A

real-life data

Question 35

Q

If two predictors are perfectly correlated in multiple regression then the values of b for each variable are

Answer

A

interchangable

Question 36

Q

The bad news is that less than perfect collinearity is virtually

Answer

A

unavoidable

Question 37

Q

As colinearity increases in multiple regression, there are 3 problems that arise - (3)

Answer

A

Untrustory bs
Limit size of R
Importance of predictors

Question 38

Q

As colinearity increases, there are 3 problems that arise - (3)

importance of predictors - (3)

Answer

A

Multicollinearity between predictors makes it difficult
to assess the individual importance of a predictor.

If the predictors are highly correlated, and each accounts for similar variance in the outcome, then how can we know
which of the two variables is important?

Quite simply we can’t tell which variable is important – the model could include either one, interchangeably.

Question 39

Q

One way of identifying multicollinearity in multiple regression is to scan a

Answer

A

a correlation matrix of all of the predictor
variables and see if any correlate very highly (by very highly I mean correlations of above .80
or .90)

Question 40

Q

SPSS produces colinearity diagnoistics in multiple regression which is - (2)

Answer

A

variance inflation factor (VIF) and tolerance

Question 41

Q

The VIF indicates in multiple regression whether a

Answer

A

predictor has a strong linear relationship with the other predictor(s).

Question 42

Q

If VIF statistic is above 10 in multiple regression there is a good reason to worry about

Answer

A

potential problem of multicolinearity

Question 43

Q

If VIF statistic above 10 or approaching 10 in multiple regression then what you would want to do is have a - (2)

Answer

A

look at your variables to see if you need to include all variables whether all need to go in model

if high correlation between 2 predictors (measuring same thing) then decide whether its important to include both vars or take one out and simplify regression model

Question 44

Q

Related to the VIF in multiple regression is the tolerance
statistic, which is its

Answer

A

reciporal (1/VIF) = inverse of VIF

Question 45

Q

In tolerance, value below 0.2 shows in multiple regression

Answer

A

issue with multicolinerity

Question 46

Q

In Plots in SPSS, you put in multiple regression - (2)

Answer

A

ZRESID on Y and ZPRED on X

Plot of residuals against predicted to asses homoscedasticity

Question 47

Q

What is ZPRED in MR? - (2)

Answer

A

(the standardized predicted values of the dependent variable based on the model).

These values are standardized forms of the values predicted by the model.

Question 48

Q

What is ZRESID in MR? - (2)

Answer

A

(the standardized residuals, or errors).

These values are the standardized differences between the observed data and the values that the model predicts).

Question 49

Q

SPSS in multiple linear regression gives descriptive outcoems which is - (2)

Answer

A

basics means and also a table of correlations between variables.
This is a first opportunity to determine whether there is high correlation between predictors, otherwise known as multi-collinearity

Question 50

Q

In model summary of SPSS, it captures how the model or models explain in MR

Answer

A

variance in terms of R squared, and more importantly how R squared changes between models and whether those changes are significant.

Question 51

Q

Diagram of model summary

Question 52

Q

What is the measure of R^2 in multiple regression

Answer

A

measure of how much of the variability in the outcome is accounted for
by the predictors

Question 53

Q

The adjusted R^2 gives us an estimate of in multiple regression

Answer

A

fit in the general population

Question 54

Q

The Durbin-Watson statistic if specificed in multiple regresion tells us whether the - (2)

Answer

A

assumption of independent errors is tenable (value less than 1 or greater than 3 raise alarm bells)

value closer to 2 the better = assumption met

Question 55

Q

SPSS output for MR = ANOVA table which performs

Answer

A

F-tests for each model

Question 56

Q

SPSS output for MR contains ANOVA that tests whether the model is

Answer

A

significantly beter at predicting the outcome than using the mean as a ‘best guess’

Question 57

Q

The F-ratio represents the ratio of

Answer

A

improvement in prediction that results from fitting the model, relative to the inaccuracy that still exists in the model

Question 58

Q

We are told the sum of squares for model (SSM) - MR regression line in output which represents

Answer

A

improvement in prediction resulting from fitting a regression line to the data rather than using the mean as an estimate of the outcome

Question 59

Q

We are told residual sum of squares (Residual line) in this MR output which represents

Answer

A

total difference between
the model and the observed data

Question 60

Q

DF for Sum of squares Model for MR regression line is equal to

Answer

A

number of predictors (e.g., 1 for first model, 3 for second)

Question 61

Q

DF for Sum of Squares Residual for MR is - (2)

Answer

A

Number of observations (N) minus number of coefficients in regression model

(e.g., M1 has 2 coefficents - one for predictor and one for constant, M2 has 4 - one for each 3 predictor and one for constant)

Question 62

Q

The average sum of squares in ANOVA table is calculated by

Answer

A

calculated for each term (SSM, SSR) by dividing the SS by the df. T

Question 63

Q

How is the F ratio calculated in this ANOVA table?

Answer

A

F-ratio is calculated by dividing the average improvement in prediction by the model (MSM) by the average
difference between the model and the observed data (MSR)

Question 64

Q

If the improvement due to fitting the regression model is much greater than the inaccuracy within the model then value of F will be

Answer

A

greater than 1 and SPSS calculates exact prob (p-value) of obtaining value of F by change

Answer 62

A

there is a positive relationship between the predictor and the outcome,

Answer 63

A

represents a negative relationship between predictor and outcome variable?

Answer 64

A

Indicating positive relationships so as advertising budget increases, record sales increases (outcome)

plays on ratio increase as do record sales

attractiveness of band increases record sales

Answer 65

A

predictor affects the outcome if the effects of all other predictors are held constant:

Answer 66

A

(b = 0.085):

This value indicates that as advertising budget (x)
increases by one unit, record sales (outcome, y) increase by 0.085 units.

This interpretation is true only if the
effects of attractiveness of the band and airplay are held constant.

Answer 67

A

not dependent on the units of measurements of variables

Answer 68

A

the number of standard deviations that the outcome will change as a result of one standard deviation change
in the predictor.

Answer 69

A

a better insight into the
‘importance’ of a predictor in the mode

Answer 70

A

both variables have a comparable degree of importance in the model

Answer 71

A

advertising budget increases by one standard deviation (£485,655), record sales increase by 0.511 standard deviations.

This interpretation is true only
if the effects of attractiveness of the band and airplay are held constant

Answer 72

A

95% of these sampels these boundaries containn true value of b

Answer 73

A

true (pop) value of b

Answer 74

A

value of b in this sample is close to the true value of b in the populatio

Answer 75

A

in some samples the predictor has a negative
relationship to the outcome whereas in others it has a positive relationship

Answer 76

A

two best predictors (advertising and airplay) have very tight confidence intervals indicating that the estimates for the current model are likely to be representative of the true population values

interval for attractiveness is wider (but still does not cross zero) indicating that the parameter for this variable is less representative, but nevertheless significant.

Answer 77

A

Pearson’s correlation coefficients

Answer 78

A

represent the relationships between each predictor and the outcome variable, controlling for the effects of the other two predictors.

Answer 79

A

represent the relationship between each predictor and the outcome, controlling for the effect that the other two variables have on the outcome.

representing the unique relationship each predictor has with otucome

Answer 80

A

variance of outcome explained by predictors divided by total

(A+C)/(A+B+C+E)

Answer 81

A

unique variance in outcome (ignore all other predictors) explained by predictor divided by variance in outcome not explained by all other predictors

A/A+E

Answer 82

A

unique variance in outcome explained by predictor divided by total variance in outcome

A/A+B+C+E

Answer 83

A

entered into the model.

Answer 84

A

may be biased

Answer 85

A

serious problem.

Answer 86

A

a potential problem

Answer 87

A

For our current model the VIF values are all well below 10 and the tolerance statistics all well above 0.2;

therefore, we can safely conclude that there is no collinearity within our data.

Answer 88

A

summary of residuals statistics to be examined of extreme cases

To see whether individual scores (cases) influence the modelling of data too much

Answer 89

A

less than -2 or greater than 2

(We expect about 5% of our cases to do tha and 95% to have standardised residuals within about +/- 2.)

Answer 90

A

10 cases (5% of 200)

Answer 91

A

99% of cases should lie within ±2.5 so expect 1% of cases lie outside limits
From cases listed, clear two cases (1%) lie outside of limits (case, 164 [investigate further has residual 3] and 179) - 1% which isconform to accurate model

Answer 92

A

broken the assumptions of the regression

Answer 93

A

investigate and potentially remove them because they are ‘outliers’

Answer 94

A

Continous outcome variable and continous or dichotomous predictor variables
Independence = all values of outcome variable should come from different participant
Non-zero variance as predictors should have some variation in value e.g., variance ≠ 0
No outliers
No perfect or high collinearity
Histogram to check for normality of errors
Scatterplot of ZRES against ZPRED to check for linearity and homoscedasticity = looking for random scatter
Independent errors (Durbin-Watson)

Answer 95

A

undue influence on a predictor’s b coefficient

Answer 96

A

partial plots as well

Answer 97

A

the partial plot shows the strong positive relationship to album sales.

There are no obvious outliers and the cloud of dots is evenly spaced out around the line, indicating homoscedasticity.

Answer 98

A

the plot again shows a positive relationship to album sales, but the dots show funnelling,

There are no obvious outliers on this plot, but the funnel-shaped cloud indicates a violation of the assumption of homoscedasticity.

Answer 99

A

you cannot generalize your findings beyond your sample

Answer 100

A

transforming the raw data – but
this won’t necessarily affect the residuals!

Answer 101

A

logistic regression instead

Answer 102

A

37.4% of the variance in productivity scores was accounted for by 3 predictor variables

Answer 103

A

if we assumed no relation between predictor variables and outcome variable – flat regression line no association between these variables)

Answer 104

A

holidays had standardized beta coefficient of 0.031 whereas cake had a much higher standardized beta coefficient of 0.499 which tells us that amount of cake given out much better predictor of productivity than the amount of holidays taken

For pay we have a beta coefficient of 0.323 which tells us that pay was also a pretty good predictor in the model of productivity but slightly less than cake

Answer 105

A

P value for holidays is 0.891 which is not significant
P value for cake is 0.032 is significant
P value for pay is 0.012 is significant

Answer 106

A

o All below 10 here showing we are unlikely to have a problem with multicollinearity so we can not worry about that for this data

Answer 107

A

another predictor - block 2 of 2

Answer 108

A

baseline not M1

Answer 109

A

change statistics

Answer 110

A

M2 explains an extra 7.5% which is sig

Answer 111

A

values would vary across different samples,

and these standard errors are used to determine
whether or not the b-value differs significantly from zero

Answer 112

A

significantly different from 0. I

Answer 113

A

slope of the regression line is significantly different from horizontal,

but in multiple regression, it is not so easy to visualize what the value tells us.

Answer 114

A

predictor is making a sig contribution to model

Answer 115

A

predictor is making a significant contribution to the model.

Answer 116

A

contribution of that predictor.

Answer 117

A

For this model, the advertising budget (t(196) = 12.26, p < .001), the amount of radio play prior to release (t(196) = 12.12, p < .001) and attractiveness of the band (t(196) =4.55, p < .001) are all significant predictors of record sales.

From the magnitude of the t-statistics we can see that the advertising budget and radio play had a similar impact,
whereas the attractiveness of the band had less impact.

Answer 118

A

one DV (usually denoted as Y) and a series of other variable (known as IV)

Answer 119

A

we are talking about a variable with a infinante number of real numbers within a given interval so something like height or age

Answer 120

A

variable that can only hold two distinct values like male and female

Answer 121

A

line of best fit in MR

Answer 122

A

one or two outliers then could be okay

Answer 123

A

are over 3 SD from the mean

Answer 124

A

-3 and 3 SD

Answer 125

A

Weight, Activity, and the interaction between them are statistically significant

Answer 126

A

Homoscedasticity: similar variance of residuals (errors) across the variable continuum, e.g. equally accurate.

Heteroscedasticity: variance of residuals (errors) differs across the variable continuum, e.g. not equally accurate

Answer 127

A

your distribution

Answer 128

A

normally distributed errors/residuals

Answer 129

A

0 = errors between pairs of obsers are pos correl
2 = independent error
4 = errors between pairs of observs are neg correl

Answer 130

A

1.5 and 2.5

Answer 131

A

‘generalizes’ to the entire population.

Answer 132

A

for small N and where results are to be generalized use the adjusted R2

Answer 133

A

Standard: To assess impact of all predictor variables simultaneously
Hierarchical: To test predictor variables in a specific order based on hypotheses derived from theory
Stepwise: If the goal is accurate statistical prediction from a large number of predictor variables – computer driven

Answer 134

A

Tells that OCD interpretiotn of intrustrions would have not have a significant impact on model’s ability to predict social anxiety

Beta value of Interpretation of Intrusions is very small, indicating small influence on outcome variable

Beta is the degree of change in the outcome variable for every 1 unit of change in the predictor variable.

Answer 135

A

When predictor variables correlate very highly with each other

Answer 136

A

Normality of residuals

Answer 137

A

The t-statistic is equal to the regression coefficient divided by its standard deviation

Answer 138

A

The residual error in the prediction of fear scores when both gender and fantasy proneness are included as predictors in the model.

Answer 139

A

The improvement in the prediction of depression by fitting the model

Answer 140

A

Regression assumptions that have been met

Answer 141

A

Somewhere between −3.369 and −0.517

Answer 142

A

Stress from research

Answer 143

A

As stress from teaching increases by one unit, burnout decreases by 0.36 of a unit.

Answer 144

A

No, because the errors show heteroscedasticity.

Answer 145

A

Note that you expect 1% of cases to lie outside this area so in a large sample, if you have one or two, that could be ok

Answer 146

A

A record company boss was interested in predicting album sales from advertising.
Data
200 different album releases

Outcome variable:
Sales (CDs and Downloads) in the week after release

Predictor variables
The amount (in £s) spent promoting the album before release
Number of plays on the radio

Answer 147

A

observed values of the outcome, and the values predicted by the model.

Answer 148

A

Difference between no predictors and model 1 (a).
Difference between model 1 (a) and model 2 (b).

Our model 2 is significantly better at predicting the value of the outcome variable than the null model and model 1 (F (2, 197) = 167.2, p<.001) and explains 66% of the variance in our data (R2=.66

Answer 149

A

y = 0.09x1 + 3.59x2 + 41.12

For every £1,000 increase in advertising budget there is an increase of 87 record sales (B = 0.09, t = 11.99, p<.001).

For every number of plays on Radio 1 per week there is an increase of 3,589 record sales (B = 3.59, t = 12.51, p<.001).

Answer 150

A

o R squared = 0.09
o F statistic = 22.54
o P value = p < 0.001

Answer 151

A

D - data poiints show random pattern

Answer 152

A

A –>The R square change in step 2 was .020,

Answer 153

A

A fashion student was interested in factors that predicted the salaries of catwalk models. He collected data from 231 models. For each model he asked how much they earned per day (salary), their age (age), and how many years they had worked as a model (years_modelling).

The student wanted to know if the number of years spent modelling predicted the models’ salary after the models’ age was taken into account.

Answer 154

A

Somewhere between 3.369 and 0.517