Chpt 15 Flashcards

1
Q

How many variables and what type of variables are involved in multiple regresson

A

2 or more independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the Multiple regression the study of

A

how a dependent variable y is related to 2 or more independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the multiple regression model

A

Y = B0+B1X1 to B2x2 +…..Bpxp + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the random variable in teh regression model

A

E - the error term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what does the error term in multiple regression account for (SSE)

A

accounts for the variability in y that cannot be explained by the linear effect of the p independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the assumptions in multiple regression

A

the mean or expected value of E is 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the multiple regression equation

A

E(y) = B0 + B1x1 + Bx2x+………BpXP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does E(y) stand for

A

mean or expected value of y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

in multiple regression, when we use sample data to estimate the multiple regression equation, what is the fromula

A

yTriangle hat = b0 +b1x1+b2x2…….bpxp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what does y trainagle hat stand for in mutlple regression

A

predicted value of the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does yi stand for

A

observed value of the dependent variable for the ith observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does y traingle hat i stand for

A

predicted value of the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

when adding more independent variable to a multiple regression, does it mean the regression will be “better off” why

A

no, it can make things worse, called overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is multicollinearity

A

the addition of more independent variables creates more relationships among them
- so not only are the I.V. potentially related to the Dependent variable, they are also potentially related to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If you have 4 I.V., how many relationships do you have

A

4 - with the I.V and D.V and 6 more with the I.vs

so in total there are 10 relationships to consider

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

do all I.V. help at predicting the D.V?

A

no, some I.V. are better at predicting the D.V. than others, some contribute nothing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

in multicollinearity, what is the ideal situation

A

that all of the I.Vs to be correlated with the D.V. but NOT with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

in multiple regression, how is each coefficient interpreted as?

A

the estimated change in y corresponding to a one unit change in a variable when all other variables are held constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the 6 preps for multiple regression

A
  1. generate a list of potential variable; indpednent and dependent
  2. Collect data on the variables
  3. check the relationship b/w each I.V and the D.V. using scatter plots and correlations
  4. (optional) conduct simple linear regression for each i.V./D.V pair
  5. use the non-redundant I.V.s in teh analysis to find the best fitting model
  6. use the best fitting model to make predictions about the D.V.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what two problems can happen in multiple regression

A
  1. overfitting and 2. multicollinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is overfitting

A

is caused by adding too many I.V.; they account for more variance but add nothing to the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is multicolinearity

A

happens when some / all of the i.v.s are correlated with each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

In Simple linear regression how do we interpret bi

A

as an estimate of the change in y for a one-unit change in the I.V.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In multiple linear regression how do we interpret bi

A

we interpret each regression coefficient as : bi - an estimate of the change in y corresponding to ta one unit change in xi when all other intendent variables are held constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

multiple regression what does bi represent

A

an estimate of the change in y corresponding to a one-unit change in xi when all other I.Vs are held constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the formula for Coefficient of Determination in simple linear regression

A

r squared = SSR/SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what is the formula for Coefficient of Determination with multiple regression

A

Rsqaured = SSR/SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does the Multiple Coefficient of Determination indicate

A

indicates we are measuring the goodness of fit for the estimated multiple regression equation
- denoted R squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How can R sqaured (Multiple coefficient of determination) be interpreted as

A

the proportion of variability in the dependent variable that can be explained by the estimated multiple regression equation - when multiplied by 100, it can be interpreted as the % of the variability in y that can be explained by the estimated regression equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What would this mean: R squared = .904

A

90.4% of the variability in travel time y is explained by the estimated multiple regression equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Does R squared always increase or decrease as intendent variables are added

A

always increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Why do many analysts prefer adjusting R squared for the # of independent variables to avoid what

A

to avoid overestimating the impact of adding an i.v.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the formula for the adjusted multiple coefficient of determination

A

R squared a = 1 - (1-R sqaured)[ (n-1) (n-p-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does “p” represent in the adjusted multiple coefficient of determination

A

of independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What happens to SSE when you add more I.V.s

A

causes prediction errors to become smaller, thus reducing the SSE
SSR = SST-SSE
- when SSE becomes smaller, causing R squared - ssr/sst to increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

IF a variable is added to the model, what happens to R squared

A

becomes larger even if the variable added is not statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

If the value of R squared is small and the model contains a large # of I.V. what can happen to the Adjusted Coeff. of Det.

A

can take on a negative value

- in such cases, minitab sets the adjusted coeff of det to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

in multiple regression, what is the variance of E denoted by

A

Q sqaured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

in multiple regression, what is the variance of E expectation

A

same for all values of the I.vs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are the assumptions about E in simple or multiple linear regression

A
  1. the error term E - is a random variable with an expected value of 0
  2. The variance of E is the same for all values of the I.Vs
  3. The values of E are independent
  4. The Error term - is a normally distributed random variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What type of graph is the mutliple regression

A

plane in 3-d space graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is the value of E in multiple regression

A

difference b/w actual y and the expected value of y E(y) when x1 = x* and x2 = X*2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Regression Analysis Terms
Dependent variable - we now use
Graph is called a

A

we now use response variable

graph is called a response surface

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

In Simple linear regression what tests did we use to test for significance

A

t and F test

- both provided the same conclusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

In multiple regression what tests do we use when testing for significance

A

F test and t test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

in multiple regression what is the F test for

A

used to determine whether a significanct relationship exists b/w d.v and the set of i.vs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

in multiple regression what is the F test referred to as

A

the test for Overall significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

in multiple regression what is the t test used for

A

used to determine whether EACH of the I.V.s is Significant

- a separate t test is conducted for EACH of the I.V. s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

in multiple regression what is the t test referred to as

A

a test for individual significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is the Hypotheses for F test in multiple regression (test for overall significance)

A

HO: B1 = B2 = …….Bp = 0
Ha: one or more of the parameters is NOT equal to zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

In multiple regression, with the F test, if HO is reject what can we say

A

gives us sufficient statistical evidence to conclude that one or more of the parameters is NOT equal to zero and
that the overall relationship b/w y and the set of I.V s is significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

In multiple regression, with the F test, if we cannot reject Ho, what can we say

A

we do not have sufficient evidence to conclude that a significant relationship is present

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

in multiple regression, what is the formula for mean square

A

sum of squares / df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

In multiple regression, what is the df for Total sum of squares

A

n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

in multiple regression, what is the df for sum of squares due to regression (SSR)

A

p df

p - # of i.v?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

in multiple regression, what is teh df for sum of squares due to error

A

SSE has n-p-1 df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

in multiple regression, what is the df for MSR

A

MSR = SSR/p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

in multiple regression, what is the df for MSE

A

MSE = SSE/n-p-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

In multiple regression, what does MSE provide

A

provides an unbiased estimate of Q squared (the variance of the error term E)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

in multiple regression, if HO: B1 = B2 = Bp = 0, then what can we say about MSR

A

then MSR also provides an unbiased est of Q squared

and the value of MSR/MSE should be close to 1

61
Q

in multiple regression, if HO: B1 = B2 = Bp = DOES NOT = 0, then what can we say about MSR

A

MSR overestimates Q squared

MSR/MSE - becomes larger

62
Q

How do you determine how large the value of F must be to reject Ho

A

p-value approach - reject HO if p-value < a

CV approach reject if F > Fa

63
Q

What is Fa based on and what is the df

A

based on F distribution

df for numerator = p
df for denominator = n-p-1

64
Q

What is the standard error also callaed

A

standard error

65
Q

what is the formula for the standard error

A

sqaure root of MSE

66
Q

what is the test statistic formula for t test in multiple regression

A

t = bi / sbi

67
Q

what is the hyp test for t test

A

Ho: B1 =B2 = 0
Ha: B1 and / or B2 is not equal to zero

68
Q

what is the hyp test for t test

A

Ho: B1 =B2 = 0
Ha: B1 and / or B2 is not equal to zero

69
Q

What does the I.V in regression analysis refer to

A

any variable being used to predict or explain the value of the D.V.

70
Q

Hwo to determine whether multicollinearity is high enough to cause problems - what test do you use

A

rule of thumb test

  • a sample correlation coefficent greater than +7 or less than =7 for 2 IVs is a warning of potential problems
  • try to avoid including i.vs that are highly correlated (in practice, this is rarely possible)
71
Q

In multiple regression, what do you do if you believe there is substantial multicollinearity

A

separating the effects of individual i.vs on the dependent variables is very difficult

72
Q

What do you use to estimate and predict

A
  1. a confidence interval and a prediction interval
73
Q

what is a confidence interval

A

mean travel for all trucks that travel 100 miles and make 2 deliveries

74
Q

what is a prediction interval

A

of the travel time for one specific truck that travels 100 miles and makes 2 deliveries

75
Q

With multiple regression, how do you develop the interval estimate for the mean value of y and the prediction interval

A

similar to simple linear reg, we use mini tab or excel or other software packages

76
Q

With multiple regression, how do you develop the interval estimate for the mean value of y and the prediction interval

A

similar to simple linear reg, we use mini tab or excel or other software packages

77
Q

How do you interpret Bo, B1 and B2 when categorical variable is present in multiple regression
X2 = 0 mechanical
E (y|mechanical) mean or expected value of repair time given a mechanical repair

A
E(y|mechanical) = B0 + B1x1 + B2(0)
                           = B0 + B1x1
for electrical 
E(y| electrical) = B0 + B1x1 = B2(1) 
                        = B0+B1x1+B2
                       =(B0+B2) + B1x1
78
Q

How do you interpret Bo, B1 and B2 when categorical variable is present in multiple regression
X2 = 0 mechanical
E (y|mechanical) mean or expected value of repair time given a mechanical repair

A
E(y|mechanical) = B0 + B1x1 + B2(0)
                           = B0 + B1x1
for electrical 
E(y| electrical) = B0 + B1x1 = B2(1) 
                        = B0+B1x1+B2
                       =(B0+B2) + B1x1
79
Q

how to interpret parameters in Multiple reg.
E(y\mechancial) = when mechanical is given 0
E(y|electircal) = when electrical is given 1

if B2 is positive

If B 2 is negative

If B = 0

A

Positive - the mean repair time for electrical will be greater than that for mechanical

Negative - the mean repair time for electrical will be less than that for mechanical

0 - no difference in the mean time b/w electrical and mechanical and the type of repair is NOT related to the repair time

80
Q

how to interpret parameters in Multiple reg.
E(y\mechancial) = when mechanical is given 0
E(y|electircal) = when electrical is given 1

if B2 is positive

If B 2 is negative

If B = 0

A

Positive - the mean repair time for electrical will be greater than that for mechanical

Negative - the mean repair time for electrical will be less than that for mechanical

0 - no difference in the mean time b/w electrical and mechanical and the type of repair is NOT related to the repair time

81
Q

why do we use a dummy variable

A

the use of a dummy variable provides 2 estimated regression equations that can be used to predict the repair time depending on if its mechiancal or electrical

82
Q

if a categorical variable has k levels then how many dummy variables are required

A

k-1

- each dummy variable is coded as a 1 or a zero

83
Q

what is residual analysis useful for in multiple regression

A

standardized residuals are frequently used in residual plots and in the identification of outliers

84
Q

what is the formula for the standardized residual for observation i

A

yi - y triangle hat i / Syi- y triangle hat i

85
Q

What is Syi- y triangle hat i stand for

A

the standard deviation of the residual i

86
Q

what is the formula for Syi - y traingle hat i

A

S x square root of hi

S- standard error of the estimate

hi - leverage of observation

87
Q

HOw is the leverage of the observation determined

A

by how far the values of the I.vs are form their mean

88
Q

in multiple regression, what can the normal prob plot be used for

A

to determine whether the distribution of E appears to be normal

  • same procedure as in simple linear regression
  • use software to compute it
89
Q

How do you determine if there is an outlier

A

if the value of the standardized residual is less than -2 or greater than +2

90
Q

The presence of one or more outliers in a dat set tends to do what to the standard error

A

tends to increase the standard error of the estimate

91
Q

when the size of the standardized residual will decrease when

A

S increases

92
Q

What do we do to correct when the standardized residual rule fails to identify the outlier

A

use studentized deleted residuals

93
Q

What are studentized deleted residuals

A

may detect outliers that standardized residuals do not detect

94
Q

what is definition of unstandardized residual?

A

difference b/w an observed value and the value predicted by the model

95
Q

definition of Standardized residual

A

residual / an estimate of its SD

- also called Pearson Residuals, M=0, SD =1

96
Q

Definition of Studentized Deleted residual

A
  • the deleted residual for a case / by it’s standard error
97
Q

What is the difference b/w a studentized deleted residual and its associated studentized residual indicate

A

how much difference eliminating a case makes on its own prediction

98
Q

What is the difference b/w a studentized deleted residual and its associated studentized residual indicate

A

how much difference eliminating a case makes on its own prediction

99
Q

How is each residual obtained in studentized delted residuals

A

obtained by regressing using all of the data EXCEPT for the point in question

100
Q

What does Si denote

A

the standard error of the estimate based on the data set with i th observation removed

101
Q

IF Si is less than S what can we say

A

the ith ovservation is an outlier

- the absolure value of the ith studentized residual will be larger than the absolute value of the standardized residual

102
Q

How can t distribution be used to determine what with regards to studentized deleted residuals

A

to determine whether the studentized deleted residuals indicate the presence of outliers

103
Q

if the value of the ith studentized deleted residual is less than T a/2 or greater than t a/2 what can we conclude

A

it’s an outlier

104
Q

what does leverage of an observation hi measure

A

how far the values of the I.V.s are form their mean values

105
Q

HOw do we compute leverage

A

use minitab

106
Q

What is the rule of thumb or hi

A

hi > CV, we have an influential observations?

107
Q

What is a problem that can happen using leverage to find influential observations

A

observations can be identified as having a high leverage and not necessarily be influential
- using leverage can lead to wrong conclusions

108
Q

What can we use to eliminate issues with using leverage to find influential observations

A

Cook’s distance measure

109
Q

What does Cook’s Distance measure use

A

both leverage of observation i, hi and the residual observation i (ui-Y triangle hat i)

110
Q

What is the formula for Cooks distance measure

A

Di - (Yi-y triangle hati)squared / (p+1)s Squared [hi / (1-hi)squared]

111
Q

IF Di >1 what can we conclude

A

the ith observation is influential and should be studied further

112
Q

What is logistic Regression example

A

estimate the prob that the bank will approve the request for a c/c given a particula set of vlaues for the chosen I.Vs

113
Q

what does logistic regression require

A

dependent variable y and one or more i.v s

114
Q

What is the logistic regression equation

A

E(y) = e B0+B1x1+B2x2+…Bpxp / 1+e B0+B1x1+B2x2+..Bpxp

115
Q

In logistic regression, what type of graph do we have

A

s shaped graph

116
Q

in logistic regression, what does the values of E(y) representing probability increase dhow

A

fairly rapidly as x increase father up

117
Q

What does logistic regression seek to do

A
  1. model the prob. of an event occurring depending on the values of the I.Vs, which can be categorical or numerical
  2. estimate the prob. that an event occurs for a randomly selected observation vs the prob the event does not occur
  3. predict the effect
118
Q

What does logistic regression seek to do

A
  1. model the prob. of an event occurring depending on the values of the I.Vs, which can be categorical or numerical
  2. estimate the prob. that an event occurs for a randomly selected observation vs the prob the event does not occur
  3. predict the effect of a series of variables on a binary response variable (0 or 1)
  4. classify observations by estimating the prob that an observation is in a particular category (such as approved or not approved)

Model, estimate, predict and classify)

119
Q

In logistic regression why won’t simple linear regression work

A

b/c simple linear regression is one quantiative variable predicting another

120
Q

in logisitc regression why won’t multiple linear regression work

A

multiple linear regression is simple linear regression with more i.vs

121
Q

in logisitc regression why won’t nonlinear regression work

A

still 2 quantative variables but the data is curvilinear

122
Q

running typical regressions with logistics causes what problems

A
  1. binary data does not have a normal distribution (1 or 0), which is a condition needed for most other types of regression
  2. predicted values of the DV can be beyond 0 and 1 which violates the definition of probability
  3. probabilities are often not linear such as ‘U” shapes where prob is very low or very high at extremes of x -values
123
Q

running typical regressions with logistics causes what problems

A
  1. binary data does not have a normal distribution (1 or 0), which is a condition needed for most other types of regression
  2. predicted values of the DV can be beyond 0 and 1 which violates the definition of probability
  3. probabilities are often not linear such as ‘U” shapes where prob is very low or very high at extremes of x -values `
124
Q

What are Odds ratio formula

A

odds = Prob (occurring)/ Prob (not occurring)

= p / (1-p)

125
Q

What are Odds ratio formula

A

odds = Prob (occurring)/ Prob (not occurring)

= p / (1-p)

126
Q

when is the odds ratio used

A

in logistic regression

127
Q

What does the odds ratio measure

A

the impact on the odds of a one-unit increase in only one of the I.Vs
- the odd’s that y = 1 given that one of the IVs has been increased by 1 unit (odds1)/(the odds that y = 1 given no change in the values (odds0)

128
Q

what is the event of interest in the odd’s ratio

A

y = 1

129
Q

in logistic regression we are estimating what

A

an unknown p for any given linear combination of the I.vs (so the prob of succcess is p and failure is q = 1-p

130
Q

TEsting for significance in logisitc regression we use what type of test

A

G test

131
Q

What is the Hyp test for significane for logistic regression

A

Ho: B1 = B2 = 0

132
Q

If the null hypothesis in logistic regression is true, what can we say about the sampling distribution of G

A

Follows a chi-square distribution

df = # of I.V

133
Q

in logistic regression if a is > p-value what can we conclude

A

we reject HO and conclude that the overall model is significant

134
Q

What does the G test show

A

overall significance

135
Q

What does the G test show

A

overall significance

136
Q

what is the z test used for in logistic regression

A

used to determine whether each of the I.V.s is making a significant contribution to the overall model

137
Q

What is the hyp test for z-test for logistic regression

A

HO:B1 = 0

138
Q

IN a z test for logistic regression, if HO is true

A

the estimated coefficient divided by its standard error follows a normal prob distribution

139
Q

Adjusted multiple coefficient of determination - Definition

A

A measure of the goodness of fit of the estimated multiple regression equation that adjusts for the number of independent variables in the model and thus avoids overestimating the impact of adding more independent variables

140
Q

Categorical independent variable - Definition

A

An independent variable with categorical data

141
Q

Cook’s distance measure - Definition

A

A measure of the influence of an observation based on both the leverage of observation i and the residual for observation i

142
Q

Dummy variable - Definition

A

A variable used to model the effect of categorical independent variables. A dummy variable may take only the value zero or one

143
Q

Multicollinearity - Definition

A

The term used to describe the correlation among the independent variables

144
Q

Multiple coefficient of determination - Definition

A

A measure of the goodness of fit of the estimated multiple regression equation. It can be interpreted as the proportion of the variability in the dependent variable that is explained by the estimated regression equation

145
Q

Multiple regression analysis - Definition

A

Regression analysis involving two or more independent variables.

146
Q

Multiple regression equation - Definition

A

The mathematical equation relating the expected value or mean value of the dependent variable to the values of the independent variables; that is, E(y) = B0 + B1x1 + B2x2 + . . . + Bpxp.

147
Q

Multiple regression model - Definition

A

The mathematical equation that describes how the dependent variable y is related to the independent variables x1, x2, . . . , xp and an error term e.

148
Q

Odds in favor of an event occurring - Definition

A

The probability the event will occur divided by the probability the event will not occur

149
Q

Odds Ratio - Definition

A

The odds that y = 1 given that one of the independent variables increased by one unit (odds1) divided by the odds that y = 1 given no change in the values for the independent variables (odds0); that is, Odds ratio = odds1yodds0