Week 4: Multiple Regression Flashcards

1
Q

What is the decision tree for multiple regression? - (4)

A
  • Continous
  • Two or more predictors that are continous
  • Multiple regression
  • Meets assumptions of parametric tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

simple linear regression
the outcome variable Y is

A

predicted using the equation of a straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Multiple regression still uses the same basic equation of …. but the model is still complex

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Multiple regression is the same as simple linear regression expect for - (2)

A

every extra predictor you include, you have to add a coefficient;

so, each predictor variable has its own coefficient, and the outcome variable is predicted from a combination of all the variables multiplied by their respective coefficients plus a residual term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Multiple regression equation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In multiple regression equation, list all the terms - (5)

A
  • Y is the outcome variable,
  • b1 is the coefficient of the first predictor (X1),
  • b2 is the coefficient of the second predictor (X2),
  • bn is the coefficient of the nth predictor (Xn),
  • εi is the difference between the predicted and the observed value of Y for the ith participant.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Multiple regression uses the same principle as linear regression in a way that

A

we seek to find the linear combination of predictors that correlate maximally with the outcome variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression is a way of predicting things that you have not measured by predicting

A

an outcome variable from one or more predictor variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Regression can be used to produce a

A

linear model of the relationship between 2 variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Record company interested in creating model of predicting recording sales from advertising budget and plays on radio per week (airplay)

  • Example of it’s MR plotted on + number of vars measured, what vertical axis shows, horizontal and third axis shows - (4)
A

It is a three dimensional scatter plots, which means there are three axes measuring the value of the three variables.

The vertical axis measures the outcome, which in this case is the number of album sales.

The horizontal axis measures how often the album is played on the radio per week.

The third axis, which can can think of being directed into the page measures the advertising budget.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can’t plot a 3D plot of MR as shown here

A

for more than 2 predictor (X) variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The overlap in the diagram is the shared variance, which we call the

A

covariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

covariance is also referred to as the variance

A

shared between the predictor and outcome variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is shown in E?

A

The variance in Album Sales not shared by the predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is shown in D?

A

Unique variance shared between Ad Budget and Plays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is shown in C?

A

The variance in Album Sales shared by Ad Budget and Plays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is shown in B?

A

Unique variance shared between Plays and Album Sales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is shown in A?

A

Unique variance shared between Ad Budget and Album Sales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If you got two prediictors thart overlap and correlate a lot then it is a .. model

A

bad model can’t uniquely explain the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In Hierarchical regression, we are seeing whether

A

one model explains significantly more variance than the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In hierarchical regression predictors are selected based on

A

past work and the experimenter
decides in which order to enter the predictors into the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

As a general rule for hierarchical regression, - (3)

A

known predictors (from other research) should be entered into the model first in order of their importance in predicting the outcome.

After known predictors have been entered, the
experimenter can add any new predictors into the model.

New predictors can be entered either all in one go, in a stepwise manner, or hierarchically (such that the new predictor
suspected to be the most important is entered first).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Example of hierarchical regression in terms of album sales - (2)

A

The first model allows all the shared variance between Ad budget and Album sales to be accounted for.

The second model then only has the option to explain more variance by the unique contribution from the added predictor Plays on the radio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is forced entry MR?

A

method in which all predictors are forced
into the model simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Like HR, forced entry MR relies on

A

good theoretical reasons for including the chosen predictors,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Different from HR, forced entry MR

A

makes no decision about the order in which variables are entered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Some researchers believe that about forced entry MR that

A

this method is the only appropriate method for theory testing because stepwise techniques are influenced by random variation in the data and so rarely give replicable results if the model is retested.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How to do forced entry MR in SPSS? - (4)

A

Analyse –> Linear –> Regression
Put outcome in DV and IVs (predictors, x) in IV box
Can select a range of statistics in statistics box and press okay to check colinearity assumption
Can also click plots to check assumptions of homoscedasticity and lineartiy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Why select colinearity diagnostics in statistics box for multiple regression? - (2)

A

This option is for obtaining collinearity statistics such as the
VIF, tolerance,

Checking assumption of no multicolinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Multicollinearity exists when there is a

A

strong correlation between two or more predictors in a regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Multicollinearity poses a problem only for multiple regression because

A

simple regression requires only one predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Perfect collinearity exists in multiple regression when at least

A

e.g., two predictors are perfectly correlated , have a correlation coefficient of 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

If there is perfect collinearity in multiple regression between predictors it
becomes impossible

A

to obtain unique estimates of the regression coefficients because there are an infinite number of combinations of coefficients that would work equally well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Good news is perfect colinearity in multiple regression is rare in

A

real-life data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

If two predictors are perfectly correlated in multiple regression then the values of b for each variable are

A

interchangable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

The bad news is that less than perfect collinearity is virtually

A

unavoidable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

As colinearity increases in multiple regression, there are 3 problems that arise - (3)

A
  • Untrustory bs
  • Limit size of R
  • Importance of predictors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

As colinearity increases, there are 3 problems that arise - (3)

importance of predictors - (3)

A

Multicollinearity between predictors makes it difficult
to assess the individual importance of a predictor.

If the predictors are highly correlated, and each accounts for similar variance in the outcome, then how can we know
which of the two variables is important?

Quite simply we can’t tell which variable is important – the model could include either one, interchangeably.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

One way of identifying multicollinearity in multiple regression is to scan a

A

a correlation matrix of all of the predictor
variables and see if any correlate very highly (by very highly I mean correlations of above .80
or .90)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

SPSS produces colinearity diagnoistics in multiple regression which is - (2)

A

variance inflation factor (VIF) and tolerance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

The VIF indicates in multiple regression whether a

A

predictor has a strong linear relationship with the other predictor(s).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

If VIF statistic is above 10 in multiple regression there is a good reason to worry about

A

potential problem of multicolinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

If VIF statistic above 10 or approaching 10 in multiple regression then what you would want to do is have a - (2)

A

look at your variables to see if you need to include all variables whether all need to go in model

if high correlation between 2 predictors (measuring same thing) then decide whether its important to include both vars or take one out and simplify regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Related to the VIF in multiple regression is the tolerance
statistic, which is its

A

reciporal (1/VIF) = inverse of VIF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

In tolerance, value below 0.2 shows in multiple regression

A

issue with multicolinerity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

In Plots in SPSS, you put in multiple regression - (2)

A

ZRESID on Y and ZPRED on X

Plot of residuals against predicted to asses homoscedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is ZPRED in MR? - (2)

A

(the standardized predicted values of the dependent variable based on the model).

These values are standardized forms of the values predicted by the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is ZRESID in MR? - (2)

A

(the standardized residuals, or errors).

These values are the standardized differences between the observed data and the values that the model predicts).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

SPSS in multiple linear regression gives descriptive outcoems which is - (2)

A
  • basics means and also a table of correlations between variables.
  • This is a first opportunity to determine whether there is high correlation between predictors, otherwise known as multi-collinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

In model summary of SPSS, it captures how the model or models explain in MR

A

variance in terms of R squared, and more importantly how R squared changes between models and whether those changes are significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Diagram of model summary

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is the measure of R^2 in multiple regression

A

measure of how much of the variability in the outcome is accounted for
by the predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

The adjusted R^2 gives us an estimate of in multiple regression

A

fit in the general population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

The Durbin-Watson statistic if specificed in multiple regresion tells us whether the - (2)

A

assumption of independent errors is tenable (value less than 1 or greater than 3 raise alarm bells)

value closer to 2 the better = assumption met

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

SPSS output for MR = ANOVA table which performs

A

F-tests for each model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

SPSS output for MR contains ANOVA that tests whether the model is

A

significantly beter at predicting the outcome than using the mean as a ‘best guess’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

The F-ratio represents the ratio of

A

improvement in prediction that results from fitting the model, relative to the inaccuracy that still exists in the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

We are told the sum of squares for model (SSM) - MR regression line in output which represents

A

improvement in prediction resulting from fitting a regression line to the data rather than using the mean as an estimate of the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

We are told residual sum of squares (Residual line) in this MR output which represents

A

total difference between
the model and the observed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

DF for Sum of squares Model for MR regression line is equal to

A

number of predictors (e.g., 1 for first model, 3 for second)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

DF for Sum of Squares Residual for MR is - (2)

A

Number of observations (N) minus number of coefficients in regression model

(e.g., M1 has 2 coefficents - one for predictor and one for constant, M2 has 4 - one for each 3 predictor and one for constant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

The average sum of squares in ANOVA table is calculated by

A

calculated for each term (SSM, SSR) by dividing the SS by the df. T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

How is the F ratio calculated in this ANOVA table?

A

F-ratio is calculated by dividing the average improvement in prediction by the model (MSM) by the average
difference between the model and the observed data (MSR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

If the improvement due to fitting the regression model is much greater than the inaccuracy within the model then value of F will be

A

greater than 1 and SPSS calculates exact prob (p-value) of obtaining value of F by change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

What happens if b values are positive in multiple regression?

A

there is a positive relationship between the predictor and the outcome,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

What happens if the b value is negative in multiple regression?

A

represents a negative relationship between predictor and outcome variable?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What do the b values in this table tell us what relationships between predictor and outcome variable in multiple regression? (3)

A

Indicating positive relationships so as advertising budget increases, record sales increases (outcome)

plays on ratio increase as do record sales

attractiveness of band increases record sales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

The b-values also tell us, in addition to direction of relationship (pos/neg) , to what degree each in multiple regression

A

predictor affects the outcome if the effects of all other predictors are held constant:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

B-values tell us to what degree each predictor affects the outcome if the effects of all other predictors held constant in multiple regression

e.g., advertising budget - (3)

A

(b = 0.085):

This value indicates that as advertising budget (x)
increases by one unit, record sales (outcome, y) increase by 0.085 units.

This interpretation is true only if the
effects of attractiveness of the band and airplay are held constant.

70
Q

Standardised versions of b-values are much more easier to interpret as in muliple regression

A

not dependent on the units of measurements of variables

71
Q

The standardised beta values tell us that in multiple regression

A

the number of standard deviations that the outcome will change as a result of one standard deviation change
in the predictor.

72
Q

The standardized beta values are all measured in standard deviation
units and so are directly comparable: therefore, they provide a in MR

A

a better insight into the
‘importance’ of a predictor in the mode

73
Q

If two predictor variables (e.g., advertising budget and airplay) have virtually identical standardised beta values (0.512, and 0.511) it shows that in MR

A

both variables have a comparable degree of importance in the model

74
Q

Advertising budget standardised beta value of 0.511 shows (with SD of 485.655) shows us - (2) in MR

A

advertising budget increases by one standard deviation (£485,655), record sales increase by 0.511 standard deviations.

This interpretation is true only
if the effects of attractiveness of the band and airplay are held constant

75
Q

The confidence intervals of unstandardised beta values are boundaries constructed such that

A

95% of these sampels these boundaries containn true value of b

76
Q

If we collected 100 samples and in MR calculated CI for b, we are saying that 95% of these CIs of samples would contain the

A

true (pop) value of b

77
Q

A good regression model will have a narrow and small CI interval indicating in MR

A

value of b in this sample is close to the true value of b in the populatio

78
Q

A bad regression model have CI that cross zero indicating that in MR

A

in some samples the predictor has a negative
relationship to the outcome whereas in others it has a positive relationship

79
Q

In image below, which are the two best predictors based on CIs and one that isn’t as (2) in MR

A

two best predictors (advertising and airplay) have very tight confidence intervals indicating that the estimates for the current model are likely to be representative of the true population values

interval for attractiveness is wider (but still does not cross zero) indicating that the parameter for this variable is less representative, but nevertheless significant.

80
Q

If you do part and partial correlations in descriptive box, there will be another coefficients table which looks this in MR like:

A
81
Q

The zero-order correlations are the simple in MR

A

Pearson’s correlation coefficients

82
Q

The partial correlations represent the in MR

A

represent the relationships between each predictor and the outcome variable, controlling for the effects of the other two predictors.

83
Q

The part correlations in MR - (2)

A

represent the relationship between each predictor and the outcome, controlling for the effect that the other two variables have on the outcome.

representing the unique relationship each predictor has with otucome

84
Q

In this table , zero-order correlation is calculated by - (2)

A

variance of outcome explained by predictors divided by total

(A+C)/(A+B+C+E)

85
Q

Partial correlations in example is calculated by in MR- (2)

A

unique variance in outcome (ignore all other predictors) explained by predictor divided by variance in outcome not explained by all other predictors

A/A+E

86
Q

Part correlations are calculated by - (2) in MR

A

unique variance in outcome explained by predictor divided by total variance in outcome

A/A+B+C+E

87
Q

At each stage of regression SPSS gives summary of any variables that have not yet been

A

entered into the model.

88
Q

If the average VIF is substantially greater than 10 then the MR regression

A

may be biased

89
Q

MR Tolerance below 0.1 indicates a

A

serious problem.

90
Q

Tolerance below 0.2 indicates a in MR

A

a potential problem

91
Q

How to interpret this image in terms of colinearity - VIF and tolerance in MR

A

For our current model the VIF values are all well below 10 and the tolerance statistics all well above 0.2;

therefore, we can safely conclude that there is no collinearity within our data.

92
Q

We can produce casewise diagnostics to see a in MR to see (2)

A

summary of residuals statistics to be examined of extreme cases

To see whether individual scores (cases) influence the modelling of data too much

93
Q

SPSS casewise diagnostics shows cases that have a standardised residuals that are in MR (2)

A

less than -2 or greater than 2

(We expect about 5% of our cases to do tha and 95% to have standardised residuals within about +/- 2.)

94
Q

If we have a sample of 200 then expect about .. to have standardised residuals outside limits in MR

A

10 cases (5% of 200)

95
Q

What does this casewise diagnostic show? - (2) MR

A
  • 99% of cases should lie within ±2.5 so expect 1% of cases lie outside limits
  • From cases listed, clear two cases (1%) lie outside of limits (case, 164 [investigate further has residual 3] and 179) - 1% which isconform to accurate model
96
Q

If there are many more cases we likely have (more than 5% of sample size) in case wise then in MR

A

broken the assumptions of the regression

97
Q

If cases are a large number of standard deviations from the mean, we may want to in casewise diagnostics in MR

A

investigate and potentially remove them because they are ‘outliers’

98
Q

Assumptions we need to check for MR - (8)

A
  • Continous outcome variable and continous or dichotomous predictor variables
  • Independence = all values of outcome variable should come from different participant
  • Non-zero variance as predictors should have some variation in value e.g., variance ≠ 0
  • No outliers
  • No perfect or high collinearity
  • Histogram to check for normality of errors
  • Scatterplot of ZRES against ZPRED to check for linearity and homoscedasticity = looking for random scatter
  • Independent errors (Durbin-Watson)
99
Q

Diagram of assumption of homoscedasticity and linearity of ZRESID againsr ZPRED in MR

A
100
Q

Obvious outliers on a partial plot represent cases that might have in MR

A

undue influence on a predictor’s b coefficient

101
Q

Non-linear relationships and heteroscedasticity can be detected using

A

partial plots as well

102
Q

What does this partial plot show? - (2) in MR

A

the partial plot shows the strong positive relationship to album sales.

There are no obvious outliers and the cloud of dots is evenly spaced out around the line, indicating homoscedasticity.

103
Q

What does this plot show in MR(2)

A

the plot again shows a positive relationship to album sales, but the dots show funnelling,

There are no obvious outliers on this plot, but the funnel-shaped cloud indicates a violation of the assumption of homoscedasticity.

104
Q

P plot and histogram of normally distributed in MR

A
105
Q

P plot for skewed distirbution histogram for MR

A
106
Q

What if assumptions for regression is volated? in MR

A

you cannot generalize your findings beyond your sample

107
Q

If residuals show problems
with heteroscedasticity or non-normality then try to in MR

A

transforming the raw data – but
this won’t necessarily affect the residuals!

108
Q

If you have a violation of the linearity assumption then you could see whether you can in MR do l

A

logistic regression instead

109
Q

If R^2 is 0.374 (outcome var in productivity and 3 predictors) then it shows that in MR

A

37.4% of the variance in productivity scores was accounted for by 3 predictor variables

110
Q
  • In ANOVA table, tells whether model is sig improved from baseline model which is in MR
A

if we assumed no relation between predictor variables and outcome variable – flat regression line no association between these variables)

111
Q

This table tells us in terms of standardised beta values that (outcome is productivity in MR)

A

holidays had standardized beta coefficient of 0.031 whereas cake had a much higher standardized beta coefficient of 0.499 which tells us that amount of cake given out much better predictor of productivity than the amount of holidays taken

For pay we have a beta coefficient of 0.323 which tells us that pay was also a pretty good predictor in the model of productivity but slightly less than cake

112
Q

What does this table tells us in terms of signifiance? - (3) in MR

A
  • P value for holidays is 0.891 which is not significant
  • P value for cake is 0.032 is significant
  • P value for pay is 0.012 is significant
113
Q

What does this image show in terms of VIF? in MR

A

o All below 10 here showing we are unlikely to have a problem with multicollinearity so we can not worry about that for this data

114
Q

For hierarchical regression you press Next to add

A

another predictor - block 2 of 2

115
Q

In ANOVA it is comparing M2 with all its predictor variables with in MR

A

baseline not M1

116
Q

To see if M2 is an improvement of M1 in HR we need to look at … in model summary in MR

A

change statistics

117
Q

What does this change statistic show in terms of M2 and M1 in MR

A

M2 explains an extra 7.5% which is sig

118
Q

Each of these beta values shown in table has an associated standard error indicating to what extent and used to determine (2)

A

values would vary across different samples,

and these standard errors are used to determine
whether or not the b-value differs significantly from zero

119
Q

t-statistic can be derived that tests whether a b-value is

A

significantly different from 0. I

120
Q

In
simple regression, a significant value of t indicates that the but in multiple regression (2)

A

slope of the regression line is significantly different from horizontal,

but in multiple regression, it is not so easy to visualize what the value tells us.

121
Q

t-tests in MR is conceptulased as a measure of whether the

A

predictor is making a sig contribution to model

122
Q

IN MR, if t-test associated with a b-value is significant (if the value in the column labelled Sig. is less
than .05) then the

A

predictor is making a significant contribution to the model.

123
Q

In MR, the smaller value of sig, the larger value of t the greater

A

contribution of that predictor.

124
Q

For this output interpret whether predicotrs are sig predictors of record scales and magnitude t statistic on impact of record sales in MR - (2)

A

For this model, the advertising budget (t(196) = 12.26, p < .001), the amount of radio play prior to release (t(196) = 12.12, p < .001) and attractiveness of the band (t(196) =4.55, p < .001) are all significant predictors of record sales.

From the magnitude of the t-statistics we can see that the advertising budget and radio play had a similar impact,
whereas the attractiveness of the band had less impact.

125
Q

In regression it determines the strength and character of the relationship between

A

one DV (usually denoted as Y) and a series of other variable (known as IV)

126
Q

What is example of contintous variable?

A

we are talking about a variable with a infinante number of real numbers within a given interval so something like height or age

127
Q

What is an example of dichotomous variable?

A

variable that can only hold two distinct values like male and female

128
Q

If outliers are present in data then impact the

A

line of best fit in MR

129
Q

Diagram of outliers

A
130
Q

You would expect that 1% of cases to lie outside the line of best fit so in large sample if you have in MR

A

one or two outliers then could be okay

131
Q

Rule of thumb to check for outliers is to check if there are any data points that in MR

A

are over 3 SD from the mean

132
Q

All residuals should lie within ….. SDs for no outliers /normal amount of outliers in MR

A

-3 and 3 SD

133
Q

Which variables (if any) are highly correlated in MR?

A

Weight, Activity, and the interaction between them are statistically significant

134
Q

What does homoscedasticity and hetrodasticity mean in MR? - (2)

A

Homoscedasticity: similar variance of residuals (errors) across the variable continuum, e.g. equally accurate.

Heteroscedasticity: variance of residuals (errors) differs across the variable continuum, e.g. not equally accurate

135
Q

P plot plots a normal distribution against

A

your distribution

136
Q

DiP plot can check for

A

normally distributed errors/residuals

137
Q

Diagram of normal, skewed to left (pos) and skewed to right (neg) of p-plots in MR

A
138
Q

Durbin-Watson test values of 0,2,4 show that… in MR- (3)

A
  • 0 = errors between pairs of obsers are pos correl
  • 2 = independent error
  • 4 = errors between pairs of observs are neg correl
139
Q

A Durbin-Watson statistic between … and … is considered to indicate that the data is not cause for concern = independent errors in MR

A

1.5 and 2.5

140
Q

If R2 and adjusted R2 are similar, it means that your regression model

A

‘generalizes’ to the entire population.

141
Q

If R2 and adjusted R2 are similar, it means that your regression model ‘generalizes’ to the entire population.
Particularly for MR

A

for small N and where results are to be generalized use the adjusted R2

142
Q

3 types of multiple regression - (3)

A
  1. Standard: To assess impact of all predictor variables simultaneously
  2. Hierarchical: To test predictor variables in a specific order based on hypotheses derived from theory
  3. Stepwise: If the goal is accurate statistical prediction from a large number of predictor variables – computer driven
143
Q

Diagram of excluded variables table in SPSS - (3) in MR

A
  • Tells that OCD interpretiotn of intrustrions would have not have a significant impact on model’s ability to predict social anxiety

Beta value of Interpretation of Intrusions is very small, indicating small influence on outcome variable

Beta is the degree of change in the outcome variable for every 1 unit of change in the predictor variable.

144
Q

What is multicollinearity in MR

A

When predictor variables correlate very highly with each other

145
Q

When checking assumption fo regression, what does this graph tell you in MR

A

Normality of residuals

146
Q

Which of the following statements about the t-statistic in regression is not true?

The t-statistic is equal to the regression coefficient divided by its standard deviation

The t-statistic tests whether the regression coefficient, b, is significantly different from 0

The t-statistic provides some idea of how well a predictor predicts the outcome variable

The t-statistic can be used to see whether a predictor variables makes a statistically significant contribution to the regression model

A

The t-statistic is equal to the regression coefficient divided by its standard deviation

147
Q

A consumer researcher was interested in what factors influence people’s fear responses to horror films. She measured gender and how much a person is prone to believe in things that are not real (fantasy proneness). Fear responses were measured too. In this table, what does the value 847.685 represent in MR

A

The residual error in the prediction of fear scores when both gender and fantasy proneness are included as predictors in the model.

148
Q

A psychologist was interested in whether the amount of news people watch predicts how depressed they are. In this table, what does the value 3.030 represent in MR

A

The improvement in the prediction of depression by fitting the model

149
Q

When checking the assumption of the regression, the following graph shows (hint look at axis titles)

A

Regression assumptions that have been met

150
Q

A consumer researcher was interested in what factors influence people’s fear responses to horror films. She measured gender (0 = female, 1 = male) and how much a person is prone to believe in things that are not real (fantasy proneness) on a scale from 0 to 4 (0 = not at all fantasy prone, 4 = very fantasy prone). Fear responses were measured on a scale from 0 (not at all scared) to 15 (the most scared I have ever felt).

Based on the information from model 2 in the table, what is the likely population value of the parameter describing the relationship between gender and fear in MR

A

Somewhere between −3.369 and −0.517

151
Q

A consumer researcher was interested in what factors influence people’s fear responses to horror films. She measured gender (0 = female, 1 = male) and how much a person is prone to believe in things that are not real (fantasy proneness) on a scale from 0 to 4 (0 = not at all fantasy prone, 4 = very fantasy prone). Fear responses were measured on a scale from 0 (not at all scared) to 15 (the most scared I have ever felt).

How much variance (as a percentage) in fear is shared by gender and fantasy proneness in the population in MR

A

13.5%

152
Q

Recent research has shown that lecturers are among the most stressed workers. A researcher wanted to know exactly what it was about being a lecturer that created this stress and subsequent burnout. She recruited 75 lecturers and administered several questionnaires that measured: Burnout (high score = burnt out), Perceived Control (high score = low perceived control), Coping Ability (high score = low ability to cope with stress), Stress from Teaching (high score = teaching creates a lot of stress for the person), Stress from Research (high score = research creates a lot of stress for the person), and Stress from Providing Pastoral Care (high score = providing pastoral care creates a lot of stress for the person). The outcome of interest was burnout, and Cooper’s (1988) model of stress indicates that perceived control and coping style are important predictors of this variable. The remaining predictors were measured to see the unique contribution of different aspects of a lecturer’s work to their burnout.

Which of the predictor variables does not predict burnout in MR

A

Stress from research

153
Q

Using the information from model 3, how would you interpret the beta value for ‘stress from teaching’ in MR

A

As stress from teaching increases by one unit, burnout decreases by 0.36 of a unit.

154
Q

How much variance in burnout does the final model explain for the sample in MR

A

80.3%

155
Q

A psychologist was interested in predicting how depressed people are from the amount of news they watch. Based on the output, do you think the psychologist will end up with a model that can be generalized beyond the sample?

A

No, because the errors show heteroscedasticity.

156
Q
A
157
Q
A
158
Q

Diagram of no outliers for one assumption of MR

A

Note that you expect 1% of cases to lie outside this area so in a large sample, if you have one or two, that could be ok

159
Q

Example of multiple regression - (3)

A

A record company boss was interested in predicting album sales from advertising.
Data
200 different album releases

Outcome variable:
Sales (CDs and Downloads) in the week after release

Predictor variables
The amount (in £s) spent promoting the album before release
Number of plays on the radio

160
Q

R is the correlation between

A

observed values of the outcome, and the values predicted by the model.

161
Q

Output diagram what does output show in MR? - (2)

A

Difference between no predictors and model 1 (a).
Difference between model 1 (a) and model 2 (b).

Our model 2 is significantly better at predicting the value of the outcome variable than the null model and model 1 (F (2, 197) = 167.2, p<.001) and explains 66% of the variance in our data (R2=.66

162
Q

What does this output show in terms of regression model in MR? - (3)

A

y = 0.09x1 + 3.59x2 + 41.12

For every £1,000 increase in advertising budget there is an increase of 87 record sales (B = 0.09, t = 11.99, p<.001).

For every number of plays on Radio 1 per week there is an increase of 3,589 record sales (B = 3.59, t = 12.51, p<.001).

163
Q

Report R^2, F statistic and p-value to 2DP for overall model - (3)

A

o R squared = 0.09
o F statistic = 22.54
o P value = p < 0.001

164
Q

Report beta and b values for video games, resitrctions and parental aggression to 2DP and p-value in MR

A
165
Q

Which of the following assumptions of homosecdasticity and linearity is correct?

AThere is non-linearity in the data

BThere is heteroscedasticity in the data

CThere is both heteroscedasticity and non-linearity in the data

DThere are no problems with either heteroscedasticity or non-linearity

A

D - data poiints show random pattern

166
Q

Determine the proportion of variance in salary that the number of years spent modelling uniquely explains once the models’ age was taken into account:

Hierarchical regression

A 2.0%

b17.8%

c39.7%

d42.2%

A

A –>The R square change in step 2 was .020,

167
Q

Test for multicollinearity (select tolerance and VIF statistics).

Based on this information, what can you conclude about the suitability of your regression model?

AThe VIF statistic is above 10 and the tolerance statistic is below 0.2, indicating that there is no multicolinearity.

BThe VIF statistic is above 10 and the tolerance statistic is below 0.2, indicating that there is a potential problem withmulticolinearity.

CThe VIF statistic is below 10 and the tolerance statistic is above 0.2, indicating that there is nomulticolinearity.

DThe VIF statistic is below 10 and the tolerance statistic is above 0.2, indicating that there is a potential problem withmulticolinearity.

A

B

168
Q

Example of question using hierarchical regression - (2)

A

A fashion student was interested in factors that predicted the salaries of catwalk models. He collected data from 231 models. For each model he asked how much they earned per day (salary), their age (age), and how many years they had worked as a model (years_modelling).

The student wanted to know if the number of years spent modelling predicted the models’ salary after the models’ age was taken into account.

169
Q

The following graph shows:

A. Regression assumptions met

B. Non-linearity = could indicate curve

C. Hetrodasticity + Non-linearity

D. hetrodasticity

A

A

170
Q

A consumer researcher was interested in what factors influence people’s fear responses to horror films. She measured gender (0 = female, 1 = male) and how much a person is prone to believe in things that are not real (fantasy proneness) on a scale from 0 to 4 (0 = not at all fantasy prone, 4 = very fantasy prone). Fear responses were measured on a scale from 0 (not at all scared) to 15 (the most scared I have ever felt). What is the likely population value of the parameter describing the relationship between gender and fear?

A

Somewhere between 3.369 and 0.517

171
Q
A