6. Regression - Categorical variables and BEYOND!!! Flashcards
For a variable with g levels, how many dummy variables do we need to capture all the levels?
g-1
With dummy coded variables, what does the constant represent (assuming nothing else in model)?
The mean of the reference group - the score for someone who gets 0 on all variables.
Do you have to control for type I error when using dummy coded variables in regression?
Nope.
How must dummy variables be interpreted in conclusions?
As a mean difference.
Is an interaction effect moderation or mediation?
Moderation. Relationship between X1 and Y depends on the level of –or is moderated by – X2
What are the three steps in testing interactions?
- Calculate an interaction variable
- Run sequential MR with three predictors –original variables and interaction
- Interpret either delta R square or b for the interaction
What is the regression formula for an interaction?
Y=a+b1X1 +b2X2 +b3X1X2 +e
What is mean-centring when it comes to continuous variables?
Subtracting the mean from each score.
What are the two steps in calculating an interaction variable?
- Mean-centre continuous variables
2. Multiply the two variables
Why do you mean-centre?
It reduces multicollinearity.
How do you interpret the constant with mean-centred variables?
Easier to interpret constant when variables are not mean-centred – otherwise the value is based on a mean of zero. Mean-centredness is only important for calculating interaction, so best use it just for that.
Do you need to mean-centre z-scores to calculate an interaction?
Nope, they’re already mean-centred.
How do you interpret an interaction between two continuous variables?
Well, you can say there is an interaction and it’s significant.
One other option is to break on of the variables into groups - e.g., high, medium and low ability.
Which model should be interpreted, with or sans interaction?
Depends. If the interaction is significant, interpret that model (Model 2). If it’s not, interpret Model 1 and say you tested an interaction, but it was insignificant.
What is polynomial regression?
Interaction between the continuous IV and itself. Effect of variable X on DV depends on level of variable X.
How do you transform a variable to test polynomials? Three steps.
- Mean centre the variable
- Square it
- Enter it as the last block in equation
What’s the quadratic regression equation?
Y= a+b1X1 + b2X1 square + e
If the quadratic regression coefficient is positive what shape does that indicate in the data?
Curve is u-shaped function
If the quadratic regression coefficient is negative what shape does that indicate in the data?
Curve is inverted u-shaped function
If quadratic regression coefficient is very small, what shape does that indicate?
Almost flat line.
What does -3.8E-02 mean?
-3.8 x 10 to the power of -2
What’s the difference between moderation and mediation?
Moderation = interaction Mediation = indirect effect - mediating variable
What’s the difference between partial and total mediation?
Partial mediation is when an introduced mediating variable partitions part of the direct effect of the IV on the DV.
Total mediation is when an introduced mediating variable accounts for all of the effect of the IV on the DV. The effect of the IV is completely explained by the new variable.
What are the four assumptions underlying linear regression?
- Dependent variable is a linear function of the predictors
- Each observation is drawn independently
- Homogeneity of variance
- Errors are normally distributed with a mean of 0
How do you test assumption of independence?
Look at the variability in box-plots, broken down by clusters.
What is the risk of violating the assumption of independence?
You may underestimate the standard error, which increases risk of type I error.
How do you test the assumption of linearity?
Plot residuals against predictor. Ask for Loess line. Should be straight.
What is the assumption of homoscedasticity?
That variance of errors is not a function of the predictors, i.e. the variance of errors is constant at all values of X.
How do you test the assumption of normality of errors?
Use a Q-Q plot:
X-axis - observed values of the residual
Y-axis - expected values of the residual if the residuals are normally distributed
If the scatter is close to the ideal line, residuals are normally distributed.
How do you test for multicollinearity?
Test tolerance – from 0 to 1 –want score closer to 1.
Test VIF –1 or more, want closer to 1 –Keith says 10 is large. (If lots of predictors, can look for one VIF that stands out from others.)
What are the two sources of error in classical test theory?
Method error (e.g. warm/cold testing room) Trait error (due to characteristics of individual)
What is the conceptual formula for reliability?
True score / True score + error
Proportion of observed score that is accounted for
by variance in true score
What is the result in regression of assuming that error-laden predictors are free of error?
Underestimate the true effects of the predictors on the dependent variable. Lower R2 because more scatter of scores around regression line.
Regression coefficients in simultaneous regression tell us about __________ effects.
Regression coefficients in simultaneous regression tell us about direct effects.
Regression coefficient entered last into a sequential model tells us about the __________ effect of that variable.
Regression coefficient entered last into a sequential model tells us about the total effect of that variable.
What effects does the variable entered last in the last model of sequential regression tell us?
About the total AND direct effect of that variable. It assumes the variable has no indirect effect (all shared variance already taken by other variables)
What are rectangles and ellipses used to signify in structural equation modelling?
Rectangles - observed (manifest) variables
Ellipses - unobserved (latent) variables
What are exogenous/endogenous variables?
Exogenous – have arrow coming out – causal
Endogenous – have arrow coming in –effect
If arrow coming out and another arrow coming in, then still Endogenous
What is the name for residual in structural equation modelling?
Disturbance
What is the formula for disturbance?
Square root of (1 - R square)
What kind of regression do you run in SEM?
Simultaneous
How do you calculate the indirect effect of a variable on a DV?
Multiply the variable’s effect on mediating variable BY the mediating variable’s effect on the DV.
How do you calculate the total effect of a variable?
Add its direct effect and indirect effect.
How do you know which model to report in sequential regression?
Report the final statistically significant model (based on the Model Summary Table). So if the last model is not stat. sig., report the penultimate one, assuming it is stat. sig.
What is a recursive model?
A recursive model is one in which causal flow travels in only one direction - i.e. no feedback loops or reciprocal causes.