Multiple Linear Regression Flashcards

1
Q

What is simple linear regression, and explain dependent variable, independent variable, and linear relationship.

A

simple linear regression is satisfied when we have only one dependent variable and one independent variable and there’s is a linear relationship between independent and dependent variable

Y = A0 + (A1 * X1)

Y = dependent variable
X1 = independent variable and has linear relationship with Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does linear relationship mean?

A

describes relationship between dependent variable and independent variable, when one variable changes the other variable changes proportionally and consistently following constant rate of change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between simple linear regression and multiple linear regression?

A

Simple linear regression only has 1 independent variable and multiple linear regression has more than one independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are three requirements of multiple linear regression?

A

Y = a0 + a1x1 + a2x2 + a3*x3

Y = dependent variable
X1, X2, X3 = independent variable
A1, A2, A3 = partial regression coefficients have linear relationship with each independent variable
a0 = intercept where when plotting on a graph A0 crosses y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is variation of y (aka sum of squares total) and formula?

A

difference between the observation/variable in a multiple linear regression and the mean of data and then squared.

SST = (Yi - Y mean)^2

Yi = value of dependent variable in multiple linear regression
Y mean = mean of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 5 assumptions about multiple linear regression that must hold to make them valid? LHINI

A
  1. Linearity
  2. Homoskedasticity
  3. Independent of observations
  4. Normality
  5. Independence of independent variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is linearity assumption and how is linearity measured/confirmed?

A

Linearity: the dependent variable must have a linear relationship with each independent variable

Eg. y = a0 + a1x1 + a2x2 + a3*x3 (y is dependent on x1, x2, and x3)

all independent variables must be raised to the power of 1 (x1 = x1^1, x2 = x2^1, x3 = x3^1) or else it isn’t a linear relationship

  • pairwise scatterplot, plotting each independent variable against predicted values of dependent variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the homoskedasticity assumption? What is the difference between homoskedasticity and Heteroskedasticity?

A

Homoskedasticity: data points or variance of data has a constant and consistent variance from mean of data

Heteroskedasticity: data points or variance of data has a inconsistent or unequal variance from mean of data

Eg. Graph residuals on vertical axis and plot dependent variables predicted values on horizontal axis.
- homoskedasticity would have flat trend line
- Heteroskedasticity would have sloped trend line and clustering of data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is independence of observation or also known as independence of errors assumption?

A

Independence of observations: we don’t want our data points to be dependent of the previous data point we want them random.

Eg. We don’t want to plot our observations or residuals and have the observations/residuals be negative, positive, negative, positive, negative, positive; we want them random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is normality assumption for multiple linear regressions?

A

Normality: distribution of the residuals (the difference between actual data point and the predicted values) should follow a normal distribution, meaning the errors should be centered around zero and have a bell-shaped curve when visualized on a histogram or normal probability plot. bell shaped since extreme errors should be less common both negatively and positively and most errors will be centered around 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a residual in multiple linear regression?

A

Residual: difference/how far between the observation on a graph and the line of best fit are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is independence of independent variables assumption and how do you asses for independence of independent variables?

A

Independence of independent variables: the values of one independent variable do not influence or affect the values of another independent variable.

  • use pairwise scatterplots comparing 2 independent variables with each other on a graph. if truly independent there would be no clear patterns and points should be randomly scattered.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you detect violations in linearity?

A
  • using pairwise scatterplot

plot independent variables on horizontal axis and dependent variable on vertical axis. (Points should be clustered around the line of best fit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a logistic regression?

A
  • model used if dependent variable only has discrete outcomes (bankrupt, not bankrupt)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Can multiple linear regressions contain both discrete and and continuous independent variables?

A

yes a multiple linear regressions independent variables can be both discrete and continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly