unit 3 - ch 13 - multiple linear regression (mr) Flashcards

1
Q

The multivariate dependent and independent relationship

A

Y - carat
X - price of gem
X2 - cut
X3 - clarity
X4 - Color

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The multiple regression equation

A

Y hat = b + mx +

MR = Y hat = (y hat equation)
Partial or correlation coefficient is that middle part

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dummy variables

A

Using categorical (nominal) data
Converts categorical data into binary data
Used for _____ (missed in lecture)
Gem - Y - X1 - X2
Non-numeric data = text
0-1 binary code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

r

A

Sign is - or +
Range is -1 or +1
Direction is indicates
X-y relationship is =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

multi r

A

Sign is +
Range is 0 to +1
Direction is does not indicate
X-y relationship is >=
Multi-r is a single point-value representing the strength of a simultaneous relationship between the x-variables and Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

(multi) collinearity

A

Share
Line (slope)
(Multi) collinearity:
When 2 (or more) x-variables are highly correlated with each other

The mutli-variate dependents (X and Y)
Independent relationships (X and X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

multi-variate dependent vs independent relationships

A

The mutli-variate dependents (X and Y)
Independent relationships (X and X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

student car broke down on campus

A

Student X moves car (Y) across campus.
The total distance of the movement of car (Y) is 100% due to the effort of student (X) = simple linear regression

Next day students (X1 and x2) move car (Y)
We can measure the total distance car (Y) was pushed by harder to find efforts of X1 AND X2 STUDENTS ADD TO THE TOTAL MOVEMENT OF TOTAL

r or multi r formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The adverse effects of multicollinearity

A

When 2 or more x-variables are highly correlated
1. Cannot decipher which x-variable is affecting the y-variable (not an issue with SLR)
2. Increase the chances of type 2 error (FTRN that is really false)
3. The signs of the partial correlation coefficients may flip

As collinearity decreases there is an increase in each predictor variables unique portion of the variability within the Y-variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Multiple regression excel:

regression table
anova table
collinearity table

A

r = Strength
a = Significance
c = Collinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

the strength of the relationship: summary output table (regression table)

A

Coefficient of determination: the percent of the variation in gem price that is explained by the variation in carat, cut, clarity, color

N= sample size
P = number of predictors
If the general rule regarding sample size is not met adjusted R square is a more accurate indicator of the strength of the multiple regression relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

judgment call

A

Is the strength of the relation (missed again) :(
Multiple R, R square, Adjusted RSQ → Strength of the relationship → judgment call

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

test stat =

A

= between term/within term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

underlying theory of anova test

A

total variation can be divided into two distinct parts:
1 - between AND
2 - whtin (error)

and the two components can be compared to determine which is affecting the data to a greater degree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

total variation in the y-variable can be divided into distinct components

A

regression. term
residual term (error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

regression term

A

1 - regression term (Y’s relationship with the X-variable)

Regression term: Y hat - Y bar

17
Q

residual term

A

2 - residual term (random factors not in the model)

Residua Term: Y - Y hat

18
Q

full model

A

FM = Y hat = b + m(x)

19
Q

total variation

A

Total Variation: Y - Y bar

20
Q

increase of F

A

increase ms regression / decrease ms residual

21
Q

decrease of F

A

decrease ms regression / increase ms residual

22
Q

anova table

A

1 of the 4 facets of the Null states: everything is unrelated

Ho: The model of caret, cut, clarity, and color is unrelated with gem price
H1: The model of caret, cut, clarity and color is correlated with gem price

If FTRN: the model is not significantly correlated to the Gem price (is not a good model)
If RTN: the model is significantly correlated to Gem price (is statistically good model)

23
Q

when to FTRN or RTN

A

If FTRN: the model is not significantly correlated to the Gem price (is not a good model) F>a

If RTN: the model is significantly correlated to Gem price (is statistically good model) F<a

Significance F is compared to alpha not p value and RTn or FTRN

24
Q

unexplained variation

A

Naked eye appeal - seller’s reputation, seller’s service etc…
16-17%

25
Q

explained variability

A

Carat, cut, clarity, color etc.
83-84%

26
Q

The value of the chance model is not for practical use but for

A

comparison purposes

27
Q

significance of the components

A

Y hat = b0 + b1 (x1) + b2 (x2) + b3 (x3) + e….

b0 = y-int
b1 = (partial) correlation coefficient
e1 = residual

1 of the 4 facets of the Null states: everything is unrelated
Ho: each x-variable is not correlated with gem price
H1: each x-variable is correlated with gem price

IF FTR: the x-variable is not a good predictor variable
If RTN: the x-variable is a good predictor variable

28
Q

p value vs alpha (FTRN vs RTN)

A

P value < alpha : reject
P-value > alpha : FTRN

29
Q

0 and 1 variable

A

The “0” variable:
Reference group
Represents the absence the qualitative attribute

The “1” variable:
Dummy variable
Represents the presence the qualitative attribute

30
Q

look at notes i guess to understand graphs :/

A

If the gem is pink, rather than green, it demands a premium
Pink gems, on average, cost (…)

If dummy coefficient was a negative number: pink gems sell t a discount compared to green gems