Week 4: Multiple Regression Flashcards

Question

Like HR, forced entry MR relies on

Answer 1

good theoretical reasons for including the chosen predictors,

Answer 2

makes no decision about the order in which variables are entered.

Answer 3

this method is the only appropriate method for theory testing because stepwise techniques are influenced by random variation in the data and so rarely give replicable results if the model is retested.

Answer 4

Analyse --> Linear --> Regression Put outcome in DV and IVs (predictors, x) in IV box Can select a range of statistics in statistics box and press okay to check colinearity assumption Can also click plots to check assumptions of homoscedasticity and lineartiy

Answer 5

This option is for obtaining collinearity statistics such as the VIF, tolerance, Checking assumption of no multicolinearity

Answer 6

strong correlation between two or more predictors in a regression model.

Answer 7

simple regression requires only one predictor.

Answer 8

e.g., two predictors are perfectly correlated , have a correlation coefficient of 1

Answer 9

to obtain unique estimates of the regression coefficients because there are an infinite number of combinations of coefficients that would work equally well.

Answer 10

real-life data

Answer 11

interchangable

Answer 12

unavoidable

Answer 13

* Untrustory bs * Limit size of R * Importance of predictors

Answer 14

As collinearity increases so do the standard errors of the b coefficients. big standard errors for b coefficients means that these bs are more variable across samples b coefficient in our sample is less likely to represent the population.

Answer 15

two predictors are highly correlated - the second predictor accounts for same variance accounted for first variable --> the second predictor accounts for a very little unique variance If two predictors completely uncorrelated then second predictor likely to account for different varinace in outcome then acocunted for first predictor

Answer 16

Multicollinearity between predictors makes it difficult to assess the individual importance of a predictor. If the predictors are highly correlated, and each accounts for similar variance in the outcome, then how can we know which of the two variables is important? Quite simply we can’t tell which variable is important – the model could include either one, interchangeably.

Answer 17

a correlation matrix of all of the predictor variables and see if any correlate very highly (by very highly I mean correlations of above .80 or .90)

Answer 18

variance inflation factor (VIF) and tolerance

Answer 19

predictor has a strong linear relationship with the other predictor(s).

Answer 20

potential problem of multicolinearity

Answer 21

look at your variables to see if you need to include all variables whether all need to go in model if high correlation between 2 predictors (measuring same thing) then decide whether its important to include both vars or take one out and simplify regression model

Answer 22

reciporal (1/VIF) = inverse of VIF

Answer 23

issue with multicolinerity

Answer 24

ZRESID on Y and ZPRED on X Plot of residuals against predicted to asses homoscedasticity

Answer 25

(the standardized predicted values of the dependent variable based on the model). These values are standardized forms of the values predicted by the model.

Answer 26

(the standardized residuals, or errors). These values are the standardized differences between the observed data and the values that the model predicts).

Answer 27

heteroscedasticity also

Answer 28

* basics means and also a table of correlations between variables. * This is a first opportunity to determine whether there is high correlation between predictors, otherwise known as multi-collinearity

Answer 29

record sales

Answer 30

variance in terms of R squared, and more importantly how R squared changes between models and whether those changes are significant.

Answer 31

measure of how much of the variability in the outcome is accounted for by the predictors

Answer 32

fit in the general population

Answer 33

assumption of independent errors is tenable (value less than 1 or greater than 3 raise alarm bells) value closer to 2 the better = assumption met

Answer 34

F-tests for each model

Answer 35

significantly beter at predicting the outcome than using the mean as a 'best guess'

Answer 36

improvement in prediction that results from fitting the model, relative to the inaccuracy that still exists in the model

Answer 37

improvement in prediction resulting from fitting a regression line to the data rather than using the mean as an estimate of the outcome

Answer 38

total difference between the model and the observed data

Answer 39

number of predictors (e.g., 1 for first model, 3 for second)

Answer 40

Number of observations (N) minus number of coefficients in regression model (e.g., M1 has 2 coefficents - one for predictor and one for constant, M2 has 4 - one for each 3 predictor and one for constant)

Answer 41

calculated for each term (SSM, SSR) by dividing the SS by the df. T

Answer 42

F-ratio is calculated by dividing the average improvement in prediction by the model (MSM) by the average difference between the model and the observed data (MSR)

Answer 43

greater than 1 and SPSS calculates exact prob (p-value) of obtaining value of F by change

Answer 44

there is a positive relationship between the predictor and the outcome,

Answer 45

represents a negative relationship between predictor and outcome variable?

Answer 46

Indicating positive relationships so as advertising budget increases, record sales increases (outcome) plays on ratio increase as do record sales attractiveness of band increases record sales

Answer 47

predictor affects the outcome if the effects of all other predictors are held constant:

Answer 48

(b = 0.085): This value indicates that as advertising budget (x) increases by one unit, record sales (outcome, y) increase by 0.085 units. This interpretation is true only if the effects of attractiveness of the band and airplay are held constant.

Answer 49

not dependent on the units of measurements of variables

Answer 50

the number of standard deviations that the outcome will change as a result of one standard deviation change in the predictor.

Answer 51

a better insight into the ‘importance’ of a predictor in the mode

Answer 52

both variables have a comparable degree of importance in the model

Answer 53

advertising budget increases by one standard deviation (£485,655), record sales increase by 0.511 standard deviations. This interpretation is true only if the effects of attractiveness of the band and airplay are held constant

Answer 54

95% of these sampels these boundaries containn true value of b

Answer 55

true (pop) value of b

Answer 56

value of b in this sample is close to the true value of b in the populatio

Answer 57

in some samples the predictor has a negative relationship to the outcome whereas in others it has a positive relationship

Answer 58

two best predictors (advertising and airplay) have very tight confidence intervals indicating that the estimates for the current model are likely to be representative of the true population values interval for attractiveness is wider (but still does not cross zero) indicating that the parameter for this variable is less representative, but nevertheless significant.

Answer 59

Pearson's correlation coefficients

Answer 60

represent the relationships between each predictor and the outcome variable, controlling for the effects of the other two predictors.

Answer 61

represent the relationship between each predictor and the outcome, controlling for the effect that the other two variables have on the outcome. representing the unique relationship each predictor has with otucome

Answer 62

variance of outcome explained by predictors divided by total (A+C)/(A+B+C+E)

Answer 63

unique variance in outcome (ignore all other predictors) explained by predictor divided by variance in outcome not explained by all other predictors A/A+E

Answer 64

unique variance in outcome explained by predictor divided by total variance in outcome A/A+B+C+E

Answer 65

entered into the model.

Answer 66

may be biased

Answer 67

serious problem.

Answer 68

a potential problem

Answer 69

For our current model the VIF values are all well below 10 and the tolerance statistics all well above 0.2; therefore, we can safely conclude that there is no collinearity within our data.

Answer 70

summary of residuals statistics to be examined of extreme cases To see whether individual scores (cases) influence the modelling of data too much

Answer 71

less than -2 or greater than 2 (We expect about 5% of our cases to do tha and 95% to have standardised residuals within about +/- 2.)

Answer 72

10 cases (5% of 200)

Answer 73

* 99% of cases should lie within ±2.5 so expect 1% of cases lie outside limits * From cases listed, clear two cases (1%) lie outside of limits (case, 164 [investigate further has residual 3] and 179) - 1% which isconform to accurate model

Answer 74

broken the assumptions of the regression

Answer 75

investigate and potentially remove them because they are ‘outliers’

Answer 76

* Continous outcome variable and continous or dichotomous predictor variables * Independence = all values of outcome variable should come from different participant * Non-zero variance as predictors should have some variation in value e.g., variance ≠ 0 * No outliers * No perfect or high collinearity * Histogram to check for normality of errors * Scatterplot of ZRES against ZPRED to check for linearity and homoscedasticity = looking for random scatter * Independent errors (Durbin-Watson)

Answer 77

undue influence on a predictor’s b coefficient

Answer 78

partial plots as well

Answer 79

the partial plot shows the strong positive relationship to album sales. There are no obvious outliers and the cloud of dots is evenly spaced out around the line, indicating homoscedasticity.

Answer 80

the plot again shows a positive relationship to album sales, but the dots show funnelling, There are no obvious outliers on this plot, but the funnel-shaped cloud indicates a violation of the assumption of homoscedasticity.

Answer 81

you cannot generalize your findings beyond your sample

Answer 82

transforming the raw data – but this won’t necessarily affect the residuals!

Answer 83

logistic regression instead

Answer 84

37.4% of the variance in productivity scores was accounted for by 3 predictor variables

Answer 85

if we assumed no relation between predictor variables and outcome variable – flat regression line no association between these variables)

Answer 86

holidays had standardized beta coefficient of 0.031 whereas cake had a much higher standardized beta coefficient of 0.499 which tells us that amount of cake given out much better predictor of productivity than the amount of holidays taken For pay we have a beta coefficient of 0.323 which tells us that pay was also a pretty good predictor in the model of productivity but slightly less than cake

Answer 87

- P value for holidays is 0.891 which is not significant - P value for cake is 0.032 is significant - P value for pay is 0.012 is significant

Answer 88

o All below 10 here showing we are unlikely to have a problem with multicollinearity so we can not worry about that for this data

Answer 89

another predictor - block 2 of 2

Answer 90

baseline not M1

Answer 91

change statistics

Answer 92

M2 explains an extra 7.5% which is sig

Answer 93

values would vary across different samples, and these standard errors are used to determine whether or not the b-value differs significantly from zero

Answer 94

significantly different from 0. I

Answer 95

slope of the regression line is significantly different from horizontal, but in multiple regression, it is not so easy to visualize what the value tells us.

Answer 96

predictor is making a sig contribution to model

Answer 97

predictor is making a significant contribution to the model.

Answer 98

contribution of that predictor.

Answer 99

For this model, the advertising budget (t(196) = 12.26, p < .001), the amount of radio play prior to release (t(196) = 12.12, p < .001) and attractiveness of the band (t(196) =4.55, p < .001) are all significant predictors of record sales. From the magnitude of the t-statistics we can see that the advertising budget and radio play had a similar impact, whereas the attractiveness of the band had less impact.

Answer 100

one DV (usually denoted as Y) and a series of other variable (known as IV)

Answer 101

we are talking about a variable with a infinante number of real numbers within a given interval so something like height or age

Answer 102

variable that can only hold two distinct values like male and female

Answer 103

line of best fit

Answer 104

one or two outliers then could be okay

Answer 105

are over 3 SD from the mean

Answer 106

-3 and 3 SD

Answer 107

Weight, Activity, and the interaction between them are statistically significant

Answer 108

Homoscedasticity: similar variance of residuals (errors) across the variable continuum, e.g. equally accurate. Heteroscedasticity: variance of residuals (errors) differs across the variable continuum, e.g. not equally accurate

Answer 109

your distribution

Answer 110

normally distributed errors/residuals

Answer 111

* 0 = errors between pairs of obsers are pos correl * 2 = independent error * 4 = errors between pairs of observs are neg correl

Answer 112

1.5 and 2.5

Answer 113

‘generalizes’ to the entire population.

Answer 114

for small N and where results are to be generalized use the adjusted R2

Answer 115

1. Standard: To assess impact of all predictor variables simultaneously 2. Hierarchical: To test predictor variables in a specific order based on hypotheses derived from theory 3. Stepwise: If the goal is accurate statistical prediction from a large number of predictor variables – computer driven

Answer 116

* Tells that OCD interpretiotn of intrustrions would have not have a significant impact on model's ability to predict social axniety

Answer 117

When predictor variables correlate very highly with each other

Answer 118

Normality of residuals

Answer 119

The t-statistic is equal to the regression coefficient divided by its standard deviation

Answer 120

The residual error in the prediction of fear scores when both gender and fantasy proneness are included as predictors in the model.

Answer 121

The improvement in the prediction of depression by fitting the model

Answer 122

Regression assumptions that have been met

Answer 123

Somewhere between −3.369 and −0.517

Answer 124

Stress from research

Answer 125

As stress from teaching increases by one unit, burnout decreases by 0.36 of a unit.

Answer 126

No, because the errors show heteroscedasticity.

Answer 127

A. Regression assumptions have been met

Week 4: Multiple Regression Flashcards

(162 cards)