13. Linear regression models Flashcards

1
Q

What does multiple regression do?

A

Tells us how does the mean of DV change as a function of IVs- can partial the effect of each IV
* Ceteris paribus = other things equal
* What is the effect of education on salary, keeping gender, region, industry…. equal?
* Whatever is part of the regression is controlled for and held constant
* Flexibility in terms of functional form
* Popularity= Macros, shortcuts, post-hoc fixes and hacks unlike any other method
* Building on these principles for other more complicated versions of regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can multiple regression test?

A

REGRESSION LANGUAGE- NO CAUSALITY
* Remember causality is inferred not tested
* Causality is established through research design
* Very often, we have cross-sectional design when utilizing regression in that case…
* Adjust the wording of the hypothesis!
Hypothesis: Employees who feel more engaged experience the symptoms of burnout less frequently.
Hypothesis: For employees with temporary contracts, job satisfaction is less strongly related to turnover intentions
than for employees with permanent contracts.
…we expect positive/negative relationship, ….will increase/decrease with…
Regression supports/doesn’t support hypothesis (at specific significance level)

=> Regression can tell you is there a significant (non-zero) relationship between IV and DV, how big is this
relationship (magnitude) approximately, what is the direction of this relationship (positive/negative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does multiple regression technically do?

A

General model
Y=β0+β1X1+β2X2+…βkXk+u
Y= Dependent variable
β0= Intercept
β1X1+β2X2= Coefficients on independent variables, change in y with respect to change in x1, ceteris paribus
u= error term, random error and factors other than Xk that influence y
k=number of independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is centering?

A

Centering- substracting the mean of the variable
* Meaningful intercept
* Easier interpretation of interactions
* Don’t do for binary variables
* Interpretation of coefficients the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is standardization?

A

Standardization- substracting the mean and dividing by std.dev
* Standard variables- mean zero, std. dev. 1
* Comparable if on different scales
* Interpretation changes
* SPSS calculates standardized coefficients for you- no reason to change raw scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does significance testing?

A

Significance testing – is our coefficient significantly different from 0?
* So if we want to test that H0: β1=0
* t-statistics of β’1 = β’1 /standard error (β’1)
* Standard error- depends on the sample size
* Our trust in the results should be influenced by how much information we have about the population = sample size
* Standard error= deviation of the coefficient’s accuracy divided by sample size (gets smaller with larger sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do I know my t-satistic is big enough?

A

H0- coefficient is 0- we usually want to reject
* P-value- given the observed t-statistics what is the smallest significance level at which H0 would be rejected?
* *** p-value< 0.001 ** p-value< 0,01 *p-value<0,05
* Significance level tells me about the probability of being wrong- probability of being wrong should be low
* Type I error- we reject H0 although it is truth- we SHOULD ONLY do this 0,1% 1% 5% of time => we pick our significance level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Methods of choosing predictors

A

Hierarchical entry- you can choose to enter and remove entire block of variables in gradual steps
* Your first block would be variables found in previous research (controls)
* Either built up or down (enter or remove)
Forced entry
* All variables are entered simultaneously- you select and decide which one stays
Stepwise methods- SPSS chooses for you
* Forward- starting with intercept only, variable with the highest simple correlation with DV is chosen
* Variable that could best explain the remaining variance is chosen next
* Stepwise- same as forward, but also removes variables that are “least useful” in explaining variance at the same time as adding new variable
* Backward- same as forward but the opposite logic, removing variables gradually based on p-values or t-statistics
* SPSS is not smarter than you- you should always make the decisions yourself
* SPSS has no idea about theory, previous research and which hypotheses you are testing
* Your strategy- controls and basic effects first- then variables in hypotheses- then interactions, mediation tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is variance?

A

Goodness of fit- we judge how good our “model” is by seeing how much actual variance in dependent
variable it explains
* Total sum of squares SST- total variance of dependent variable
* Explained sum of squares SSE- total variance explained due to our model
* Sum of squared residuals SSR- left over variance that is unexplained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are control variables?

A

Control variables= same as any other IV but researcher is not really interested in their effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens when adding control variables?

A

Adding control variables in the model means that the variables of interest are explaining the “left over” variance, especially
if they are correlated with the control variables themselves
* GOOD- ceteris paribus
* GOOD- making sure we are not omitting an important variable that has an effect
* BAD- continuity of research- can we compare to research with other or no control variables?
* BAD- multicollinearity can decrease significance of variables
* BAD- because we are usually more lenient about their precision (age proxy for career stage?)
* UGLY- we can find a combination that suits our research agenda (do I want effects of interest to be significant or not?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we test significance?

A

Statistical tests in nutshell help us to decide whether our “model” is good
We are trying to model reality as close as possible- model the real relationship between IVs and DV for example

Test statistics- a number that summarizes how good is our model, how close is it to reality, how much of the variance in DV is our
model explaining
* F-test- variance explained by model (between-group variance in ANOVA) / error variance
* Chi-squared- the result of fitting function, we know that if we model reality in our CFA, the fitting function would be 0
* T-statistics in a regression is based on variance of the coefficient beta estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly