week 4 Flashcards

Question 1

Q

How many predictor variables in simple linear regression?

Answer

A

In simple linear regression, we only have one predictor variable to predict the criterion variable.

Question 2

Q

How many predictor variables in multiple linear regression?

Answer

A

In multiple linear regression, we can have two or more predictor variables to predict the criterion variable.

Question 3

Q

Application of Multiple Linear Regression

Answer

A

The application of multiple linear regression is ubiquitous.
§ Psychology Research
§ use different personality traits as predictor variables to predict a life outcome (e.g., GPA).
§ use couple interaction measures to predict relationship satisfaction.
§ Marketing Research:
§ use expenditure, pricing, and market conditions to predict sales.
§ Finance and Economics:
§ use interest rates, reading volumes, and market sentiment measures to predict the stock market price.
§ Large Language Processing:
§ use average sentence length, vocabulary richness, and frequency of complex words to predict the overall readability of a text.

Question 4

Q

Almost all advanced statistical methods are extensions of_________________________________

Answer

A

multiple linear regression.

§ e.g., structural equation modelling, multilevel modelling

Question 5

Q

What is a bivariate regression?

Answer

A

multiple regression with exactly two predictor variables (x1 and x2) to predict the criterion variable y.

Question 6

Q

The population bivariate regression model is denoted as:

Answer

A

uyi|xi = B0 + B1x1 + B2x2

x1 and x2 are scores on the predictor variables.

β0 is the population intercept.

β1 and β2 are population regression coefficients for x1 and x2, respectively.

µyi|xi is the predicted score on the criterion variable for participant i using the population regression model.

Question 7

Q

The sample bivariate regression model is denoted as:

Answer

A

y^i =B^0 + B^1x1 +B^2x2

x1 and x2 are scores on the predictor variables.

§ βˆ0 is the estimate of the population intercept β0.

§ βˆ1 and βˆ2 are the estimates of the population regression coefficients β1 and β2, respectively.

§ yˆi is the predicted score on the criterion variable for participant i using the sample regression model.

Question 8

Q

In a simple linear regression, the regression equation represents a line in what dimension

Answer

A

2D dimension

Question 9

Q

Can a regression with more than two predictors be represented using graphs?

Answer

A

No - represents hyperplane in higher dimensions

Question 10

Q

In a bivariate linear regression, the regression equation represents a plane in what dimension

Answer

A

3D dimension

Question 11

Q

Least Square Estimation Method in Bivariate Regression
- What does it involve
- What does the residual represent?

Answer

A

The least-square method in the bivariate regression also involves minimizing the sum of squared residual

The residual represents the vertical distance between the regression plane and the data points.

Minimizing SSresidual is minimizing the sum of the squared vertical distances between the regression plane and the data points.
Obtain the regression plane such that the sum of the squared vertical distances is the minimum

Question 12

Q

The intercept is the amount of y when…

Answer

A

x1 and x2 are both at 0

Question 13

Q

The regression coefficient βˆ1 represents….

Answer

A

the slope between x1 and yˆ

Question 14

Q

What does βˆ1 = 0.4 really mean?

Answer

A

while holding x2 (loneliness) constant, for one unit
increase in x1 (stress), there is 0.4 unit increase in ˆy (predicted illness).

represents the effect of x1 on ˆy while controlling for x2.

Question 15

Q

the regression coefficient in the bivariate regression
is a __________ coefficient

Answer

A

CONDITIONAL

partialling out the effect of the
other predictor in the model.

In multiple regression, the regression coefficient is also called the “partial regression coefficient”.

Question 16

Q

To further demonstrate what it means by partialling out the effect of the other predictor on the criterion variable, we will compute βˆ 1 in the bivariate regression using a two-stage method.

Answer

A

Find the part of x1 that is uncorrelated with x2, which
we will call e1.
- 1.1 run a simple regression model using x2 to predict x1.
1.2 then find the residual vector, which we will call e1
§ the residuals are part of x1 that is uncorrelated from x2.

Use e1 to predict y.
§ In other words, we will use the part of x1 that is
uncorrelated with x2 to predict y, hence partialling out the effect of x2 on y.

Question 17

Q

How would you interpret βˆ0? in multiple regression with many predictors

Answer

A

When all predictors are at 0, the predicted score (yˆ) is βˆ0 units.

Question 18

Q

How would you interpret βˆ1? in multiple regression with many predictors

Answer

A

Holding all other predictors constant, for one unit change in x1, there is βˆ1 unit change in y

Question 19

Q

§ When interpreting standardized partial regression coefficient, the only thing you need to change is…

Answer

A

The unit is the standard deviation (SD) unit.

Holding zx2 (or x2) constant at a specific value, for one standard deviation change in zx1 (or x1), there are βˆ 1 standard deviations change in zˆy (or yˆ).

Question 20

Q

Why do we want to partial out (or remove) the effect of all the other predictors?

Answer

A

One of the main reasons why we want to partial out the effects of other predictors is that we want to control for confounding
variables.

Controlling for confounding variables by including them in a regression model is called statistically control.
Often you want to control for demographic variables: “statistically controlling for age, ethnicity, gender, etc”.

Other times, you may want to control for a substantive variable due to research interest.
§ e.g., study the effect of anxiety on performance controlling for depression.
§ i.e., interested in the part of the anxiety that is not related to depression.

Question 21

Q

What are other ways of partial out the effect of other variables?

Answer

A

Another way of controlling for other variables is through random assignment in an experiment.
§ By randomly assigning participants to different conditions, we are automatically holding all other variables constant across conditions.

Question 22

Q

Statistical Control adv and disad

Answer

A

Advantages:
§ Easy to include predictors as long as you measure them.

Disadvantages:
§ Cannot infer causation.
§ Need to measure the predictors accurately.
§ Unlimited number of variables you want to control for.

Question 23

Q

Experimental Control ad and disadv

Answer

A

Advantages:
§ Can infer causation in an experimental study.
§ Can control for all other variables.

Disadvantages:
§ Can’t randomly assign some variables due to ethical issues.
§ Demand characteristics:
participants change their behaviour because they know they are being manipulated

Question 24

Q

§ Simple regression: minimize the sum of squared vertical distances in a

Question 25

Q

Bivariate regression: minimize the sum of squared vertical distances in a

Answer

A

3D plane.

Question 26

Q

Multiple regression: minimize the sum of squared vertical distances in a

Answer

A

higher dimension hyperplane.

Question 27

Q

Does the regression plane always go through the mean of x1,x2,y?

Answer

A

§ Being able to use x¯1, x¯2, y¯ to solve for βˆ
0 shows that the regression plane always goes through (x¯1, x¯2, y¯) point.

Question 28

Q

§ ryx1

Answer

A

is the correlation between y and x1

Question 29

Q

ryx2

Answer

A

is the correlation between y and x2

Question 30

Q

rx1x2

Answer

A

is the correlation between x1 and x2

Question 31

Q

sy

Answer

A

is the standard deviation of y.

Question 32

Q

sx1

Answer

A

is the standard deviation of x1.

Question 33

Q

sx2

Answer

A

is the standard deviation of x2.

Question 34

Q

When will the βˆ1 in the bivariate equation be equal to that in the simple linear regression?

Answer

A

When the two predictors are uncorrelated with each other (rx1x2 “ 0), the regression coefficient in the simple linear regression equals the partial regression coefficient in the bivariate regression.

Question 35

Q

Multicollinearity

Answer

A

When the predictors are multicollinear, their coefficients can be quite different in a bivariate versus a simple regression.

Question 36

Q

t-test for linear regression

Answer

A

In regression, the goal of t-tests is using sample intercept (βˆ0) and regression coefficients (βˆ1 and βˆ2) to draw inference about or make conclusion about population intercept (β0) and regression coefficients (β1 and β2)

The t-test for the intercept tests whether the population intercept is 0:

H0 : β0 = 0
H1 : β0 does not equal 0

Question 37

Q

The H0 : β1 = 0 means that

Answer

A

in the population, y cannot be explained or predicted by x1 while we control for x2.

Question 38

Q

H1 : β0 does not equal 0 means that

Answer

A

§ If we reject H0, we can conclude that y can be significantly explained or predicted by x1 while we control for x2.

Question 39

Q

In a bivariate regression, the t-statistic for the partial regression coefficient is

Answer

A

Over repeated samples, t~t(n-p-1) where p is the number of predictors. In the bivariate case, t~t(n-1)

Question 40

Q

What happens to the SE and the t-statistic as multicollinearity increases?

Answer

A

As the multicolinearity increases, the SE (βˆ1) increases and t-statistic decreases, making it less likely to be rejected (i.e., p-value increases).

In other words, if the two predictors are highly correlated, one of them is likely to be non-significant.

Question 41

Q

The t-test also tests whether the population regression coefficient β2 is 0:

What does the null and alternative hypothesis indicate:

Answer

A

The H0 : β2 = 0 means that in the population, y cannot be explained by x2 while we control for x1.

If we reject H0, we can conclude that y can be significantly explained by x2 while we control for x1.

Question 42

Q

In linear regression, the total variation in the criterion variable y can be broken down into two independent sources of variation.

SStotal =

What is each part equivalent to?

Answer

A

SStotal = SSregression + SSresidual

SSregression is equivalent to SSbetween in ANOVA.
- part of variation in Y that can be explained by all the predictors in the model

SSresidual is equivalent to SSwithin in ANOVA.
- part of variation in Y that cannot be explained by all the predictors in the model

Question 43

Q

Multiple R-Squared

Answer

A

the proportion of variation in y that can be explained by the predictors in the model

SSregression/SStotal

the square of the correlation between observed (y) and predicted (yˆ) values.

Multiple R-squared is the proportion of variation in y that can be explained by the predictors in the model.

Question 44

Q

How do we interpret the multiple R-squared value of 0.3528?

Answer

A

The proportion of variation in y that is explained by the predictor x1 in the model is 0.3528.

Question 45

Q

When thinking about the population multiple R-squared, we mean …

Answer

A

the proportion of variation in Y explained by the predictors in the population regression model built using the population data.
§ denoted as ρ2
§ ρ is a Greek letter pronounced as rho

Question 46

Q

When thinking about the sample multiple R-squared, we mean…

Answer

A

the proportion of variation in Y explained by the predictors in the sample regression model built using the sample data.
§ denoted as r2

Question 47

Q

F-test

Answer

A

The F-test in linear regression tests whether the population multiple R-squared is zero:

H0: p2 = 0
H1: p2 >0

Question 48

Q

H0: p2 = 0 means…

Answer

A

means that in the population, the proportion of variance in y that is explained by the predictors is 0:

Question 49

Q

H1: p2 >0 is equivalent to

Answer

A

§ H1 : at least one of βi’s is not zero.

If we reject H0 and endorse H1, we can conclude that the variation in y can be significantly explained by at least one of
predictors in the model.

Question 50

Q

Reject null conclusion in simple linear regression

Answer

A

Therefore, if we reject H0 and endorse H1, we can conclude that the proportion of variation in y can be significantly explained by
this one predictor in the model.

§ This is equivalent to the t-test for the one predictor

Question 51

Q

What is the relationship between t and F