L13 - Linear regression models I Flashcards
What is linear regression?
Single linear regression: Y=a*x+b
It states that “How does the mean of the dependent variable (DV) change as a function of independent variables (IVs)?”
- It is a search for correlation rather than causality
- The independent variables do not predicted the dependent variable, you just
infer that this happens.
What is multiple linear regression?
Multiple linear regression
Yi =B0 +B1X1i+B2X2i+…+BnXni+ei
In multiple linear regression we have multiple independent variables (IVs)
Intercept: It is the value of the DV if all of the other variables (the x’s) have the
value 0.
IV (x): they are explaining how the situation looks in that moment we look at
the dependent variable and influence what the dependent variables is in
that moment. They influence in a specific manner which is shown by the
beta coefficients.
coefficients (beta): They are the regression weights and correspond each to
one of the IV.
What does regression not do?
Establish causality
- There is no statistical technique that can tell us something about causal
relationships. - Causality needs to be inferred, cannot be observed
- Causality is a function of research design, not analyses
Important: That should be reflected in the wording of the hypothesis.
Significance of coefficients
Significant testing - is our coefficient significantly different from 0?
- We want to test that H0: 1=0.
What is the regression line?
The line of best fit = Y-hat
Is the line where the actually observations deviations is overall as small as possible
F-statistics
A measure of how much the model has improved the prediction of the outcome compared to the level of accuracy in the model and is also used to assess the model fit.
What are control variables?
Becker et al. (2016)
Same as any other IV but researcher is not really interested in their effects
Adding control variables in the model means that the variables of interest are explaining the “left over” variance, especially if they are correlated with the control variables themselves.
Using statistical control
Becker et al. (2016)
Statistical control is widely used in correlational studies with the intent of providing more accurate estimates of relationships among variables, more conservative tests of hypotheses, or ruling out alternative explanations for empirical findings.
Selecting control variables
- When in doubt, leave them out
- Select conceptually meaningful CVs and avoid proxies
- When feasible, include CVs in hypotheses and models
- Clearly justify the measures of CVs and the methods of control
- Subject CVs to the same standards of reliability and validity as are applies to
other variables - If the hypothesis do not include CVs, do not include CVs in the analysis
- Conduct comparative tests of relationships between IVs and CVs
- Run results with and without the CVs and contrast the findings
- Report standard descriptive statistics and correlations for CVs, and the
correlations between the measured predictors and their partialled
counterparts - Be cautious when generalizing results involving residual variables.
First up in in regression, prepare your data - but how?
From Items to scale
- Check reliability using cronbach’s alpha
- If relevant: EFA and/or CFA
- Calculate the mean to get a one score for every scale (variable) –> dosen’t
matter if it is DV og IV, we do it for all of them
- Make sure your variabels are either continuas (high to low/low to high) or in.a
dummy variabel scores (0 / 1).
First up in in regression, prepare your data - but how?
From Items to scale
- Check reliability using cronbach’s alpha
- If relevant: EFA and/or CFA
- Calculate the mean to get a one score for every scale (variable) –> dosen’t
matter if it is DV og IV, we do it for all of them
- Make sure your variabels are either continuas (high to low/low to high) or in.a
dummy variabel scores (0 / 1).
Which two options when transforming your variabels?
Centering - substracting the mean of the variabel (comes to zero)
- Meaningful intercept
- Easier interpretation of interactions
- Don’t do for binary variabels
Standardization
substracting the mean (comes to zero) and dividing by std.dev.
- standard variabels - mean = zero std. dev. = 1
- Comparable if on different scales (interpretation of coeffs changes!)
- SPSS/JASP calculates standardized coefficients for you - no reason to change
raw data