Multiple Regression Flashcards

1
Q

What does the linear regression model describes?

A

It describes a linear function and it is very general. the model y(hat) = a+bX

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

For what kind of analysis is the Multiple Linear Regression used?

A

It analyses multiple kind of data - continuous, categorical, likert scale. It’s also used to figure out which IVs are the most important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Multiple Regression?

A

Like a simple regression but now you have multiple predictors of an outcome variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the standard/Simultaneous linear regression?

A

All variables are entered into the equation at the same time without specific order. Each variable is assessed as if it were the last variable entered, this controls for the other IVs.
It takes together all of these variables explain his amount of variance in the data and takes all slopes which result into an average of slope across all independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kind of problem arises with standard/simultaneous linear regression?

A

If there are two highly correlated IVs, the one with the biggest semi-partial correlation gets all the variance therefore the other IV will get very little variance associated with it and look unimportant (suppression).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a semi-partial correlation or sr?

A

The variance from only that IV over the total variance. Tells you how much variance overall that variable accounts for. The unique contribution of that variable to R^2 - increase in proportion explained Y when that variable is added to the equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Hierarchical/sequential linear regression?

A

When predictor variables are entered in as sets or steps. Variance gets assigned at each step to the first variables entered. You can assign order according to theoretical importance, you can also control for nuisance variables in the first step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the steps in testing the IVs in the hierarchical/sequential linear regression?

A
  1. The first Iv is basically tested against r or sr since there’s nothing else in the equation it gets all the variance.
  2. the next IVs are tested against pr, they only get the left over variance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is partial correlation (pr)?

A

The variance from only that IV over the variance not accounted for (error). Tells you how much variance accounts for when you only look at variance that you can explain. Proportion of variance in U not explain by other predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If you have a group of IVs that are super highly correlated but you don’t know how to combine them or want to eliminate them, how can you use the hierarchical regression?

A

You can process each step as a SET and you don’t have to care about each individual predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is statistical/stepwise linear regression?

A

Predictor variables are entered in steps, based on some statistical cutoff. Entry into the equation is solely based on statistical relationship and nothing to do with theory or your experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the different types of stepwise regression?

A
  1. Forward: the biggest IV is added first, then each IV is added as long as it accounts for enough variance. You’d also need to create a cutoff.
  2. Backward: all are entered in the equation at first, and then each one is removed if it doesn’t for enough variance. It’s similar to simultaneous LR.
  3. Stepwise: it’s a mix between the two (adds them but then may later delete them if they are no longer impt.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What particularity does the stepwise regression have?

A

It’s the most common regression used but it’s a hot mess because of the constant back-and-forth approach. This approach is also not theory driven and it is not to be used in psych.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What interpretations can we use with multiple linear regression?

A

How good is the equation, if we can predict people’s scores better than chance. We can also use the adjusted r-squared for effect size as well as knowing which IV is the most important, which one contributes the most to prediction. We can use the coefficient statistics (t values) and adding predictors will always increase the explained variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is predictor level MLR?

A

It includes statistical control, mediation and moderation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Mediation?

A

Analyzing that the relationship between X and Y is changed by the inclusion of another variable M.

17
Q

What is Moderation?

A

Analyzing an interaction between two X variables in the prediction of Y.

18
Q

What is Leverage?

A

How much influence over the slope a person has (how much their score will change b values).

19
Q

What is the cut off rule of thumb for Leverage?

A

(2K+2)/N
K is the number of IVs predictors
N is your sample size

20
Q

What is discrepancy?

A

How far away from other data points a point is (no influence on the slope)

21
Q

What is Cook’s distance?

A

It’s a measure of influence, a combination of both leverage and discrepancy.

22
Q

What is the cut off rule of thumb for Cook’s Distance?

A

4/(N-K-1)

K is the number of IV/predictor

23
Q

What measures using Cook’s Distance, discrepancy and leverage?

A

Outliers

24
Q

What is an important thing to remember about IVs in regression?

A

If the IVs are categorical, you must make sure to change the variable into a factor. Otherwise, it will interpret that variable as continuous, which doesn’t make a
whole lot of sense. Whatever variable is coded as the first group in your factored variable
becomes the comparison group. You will automatically get every group compared against that one. What if you want all pairwise? Recode and run again

25
Q

What is Structural equation Modeling (SEM)?

A

A statistical technique that quantifies how well sample data “fit” a
theoretical model that hypothesizes a set of relations among multiple
DV variables. SEM encourages researchers to think of variables as a series of
connections

26
Q

What is the first assumption in MLR?

A

Multiple linear regression requires the relationship between the
independent and dependent variables to be linear.
The linearity assumption can best be tested with scatterplots

27
Q

What is the second assumption in MLR?

A

Multiple linear regression analysis requires that the errors between
observed and predicted values (i.e., the residuals of the regression)
should be normally distributed.
This assumption may be checked by looking at a Q-Q-Plot.

28
Q

What is the third assumption in MLR?

A

• Multiple linear regression assumes that there is no multicollinearity in
the data.
• Multicollinearity occurs when the independent variables are too
highly correlated with each other (r>.9).
• Multicollinearity may be checked multiple ways:
1. Correlation matrix – When computing a matrix of Pearson’s bivariate
correlations among all independent variables, the magnitude of the correlation
coefficients should be less than .90.
2. Variance Inflation Factor (VIF) – The VIFs of the linear regression indicate the
degree that the variances in the regression estimates are increased due to
multicollinearity. VIF values higher than 10 indicate that multicollinearity is a
problem.

29
Q

What is the fourth assumption in MLR?

A

• The last assumption of multiple linear regression is homoscedasticity.
• A scatterplot of residuals versus predicted values is good way to check
for homoscedasticity.
• There should be no clear pattern in the distribution; if there is a coneshaped
pattern (as shown below), the data is heteroscedastic.