unit 3 - ch 13 - multiple linear regression (mr) Flashcards
The multivariate dependent and independent relationship
Y - carat
X - price of gem
X2 - cut
X3 - clarity
X4 - Color
The multiple regression equation
Y hat = b + mx +
MR = Y hat = (y hat equation)
Partial or correlation coefficient is that middle part
Dummy variables
Using categorical (nominal) data
Converts categorical data into binary data
Used for _____ (missed in lecture)
Gem - Y - X1 - X2
Non-numeric data = text
0-1 binary code
r
Sign is - or +
Range is -1 or +1
Direction is indicates
X-y relationship is =
multi r
Sign is +
Range is 0 to +1
Direction is does not indicate
X-y relationship is >=
Multi-r is a single point-value representing the strength of a simultaneous relationship between the x-variables and Y
(multi) collinearity
Share
Line (slope)
(Multi) collinearity:
When 2 (or more) x-variables are highly correlated with each other
The mutli-variate dependents (X and Y)
Independent relationships (X and X)
multi-variate dependent vs independent relationships
The mutli-variate dependents (X and Y)
Independent relationships (X and X)
student car broke down on campus
Student X moves car (Y) across campus.
The total distance of the movement of car (Y) is 100% due to the effort of student (X) = simple linear regression
Next day students (X1 and x2) move car (Y)
We can measure the total distance car (Y) was pushed by harder to find efforts of X1 AND X2 STUDENTS ADD TO THE TOTAL MOVEMENT OF TOTAL
r or multi r formula
The adverse effects of multicollinearity
When 2 or more x-variables are highly correlated
1. Cannot decipher which x-variable is affecting the y-variable (not an issue with SLR)
2. Increase the chances of type 2 error (FTRN that is really false)
3. The signs of the partial correlation coefficients may flip
As collinearity decreases there is an increase in each predictor variables unique portion of the variability within the Y-variable
Multiple regression excel:
regression table
anova table
collinearity table
r = Strength
a = Significance
c = Collinearity
the strength of the relationship: summary output table (regression table)
Coefficient of determination: the percent of the variation in gem price that is explained by the variation in carat, cut, clarity, color
N= sample size
P = number of predictors
If the general rule regarding sample size is not met adjusted R square is a more accurate indicator of the strength of the multiple regression relationship
judgment call
Is the strength of the relation (missed again) :(
Multiple R, R square, Adjusted RSQ → Strength of the relationship → judgment call
test stat =
= between term/within term
underlying theory of anova test
total variation can be divided into two distinct parts:
1 - between AND
2 - whtin (error)
and the two components can be compared to determine which is affecting the data to a greater degree
total variation in the y-variable can be divided into distinct components
regression. term
residual term (error)