Chpt 15 Flashcards
How many variables and what type of variables are involved in multiple regresson
2 or more independent variables
what is the Multiple regression the study of
how a dependent variable y is related to 2 or more independent variables
What is the multiple regression model
Y = B0+B1X1 to B2x2 +…..Bpxp + E
what is the random variable in teh regression model
E - the error term
what does the error term in multiple regression account for (SSE)
accounts for the variability in y that cannot be explained by the linear effect of the p independent variables
what are the assumptions in multiple regression
the mean or expected value of E is 0
What is the multiple regression equation
E(y) = B0 + B1x1 + Bx2x+………BpXP
What does E(y) stand for
mean or expected value of y
in multiple regression, when we use sample data to estimate the multiple regression equation, what is the fromula
yTriangle hat = b0 +b1x1+b2x2…….bpxp
what does y trainagle hat stand for in mutlple regression
predicted value of the dependent variable
what does yi stand for
observed value of the dependent variable for the ith observation
what does y traingle hat i stand for
predicted value of the dependent variable
when adding more independent variable to a multiple regression, does it mean the regression will be “better off” why
no, it can make things worse, called overfitting
what is multicollinearity
the addition of more independent variables creates more relationships among them
- so not only are the I.V. potentially related to the Dependent variable, they are also potentially related to each other
If you have 4 I.V., how many relationships do you have
4 - with the I.V and D.V and 6 more with the I.vs
so in total there are 10 relationships to consider
do all I.V. help at predicting the D.V?
no, some I.V. are better at predicting the D.V. than others, some contribute nothing
in multicollinearity, what is the ideal situation
that all of the I.Vs to be correlated with the D.V. but NOT with each other
in multiple regression, how is each coefficient interpreted as?
the estimated change in y corresponding to a one unit change in a variable when all other variables are held constant
What are the 6 preps for multiple regression
- generate a list of potential variable; indpednent and dependent
- Collect data on the variables
- check the relationship b/w each I.V and the D.V. using scatter plots and correlations
- (optional) conduct simple linear regression for each i.V./D.V pair
- use the non-redundant I.V.s in teh analysis to find the best fitting model
- use the best fitting model to make predictions about the D.V.
what two problems can happen in multiple regression
- overfitting and 2. multicollinearity
what is overfitting
is caused by adding too many I.V.; they account for more variance but add nothing to the model
What is multicolinearity
happens when some / all of the i.v.s are correlated with each other
In Simple linear regression how do we interpret bi
as an estimate of the change in y for a one-unit change in the I.V.
In multiple linear regression how do we interpret bi
we interpret each regression coefficient as : bi - an estimate of the change in y corresponding to ta one unit change in xi when all other intendent variables are held constant
multiple regression what does bi represent
an estimate of the change in y corresponding to a one-unit change in xi when all other I.Vs are held constant
What is the formula for Coefficient of Determination in simple linear regression
r squared = SSR/SST
what is the formula for Coefficient of Determination with multiple regression
Rsqaured = SSR/SST
What does the Multiple Coefficient of Determination indicate
indicates we are measuring the goodness of fit for the estimated multiple regression equation
- denoted R squared
How can R sqaured (Multiple coefficient of determination) be interpreted as
the proportion of variability in the dependent variable that can be explained by the estimated multiple regression equation - when multiplied by 100, it can be interpreted as the % of the variability in y that can be explained by the estimated regression equation
What would this mean: R squared = .904
90.4% of the variability in travel time y is explained by the estimated multiple regression equation
Does R squared always increase or decrease as intendent variables are added
always increases
Why do many analysts prefer adjusting R squared for the # of independent variables to avoid what
to avoid overestimating the impact of adding an i.v.
What is the formula for the adjusted multiple coefficient of determination
R squared a = 1 - (1-R sqaured)[ (n-1) (n-p-1)
What does “p” represent in the adjusted multiple coefficient of determination
of independent variables
What happens to SSE when you add more I.V.s
causes prediction errors to become smaller, thus reducing the SSE
SSR = SST-SSE
- when SSE becomes smaller, causing R squared - ssr/sst to increase
IF a variable is added to the model, what happens to R squared
becomes larger even if the variable added is not statistically significant
If the value of R squared is small and the model contains a large # of I.V. what can happen to the Adjusted Coeff. of Det.
can take on a negative value
- in such cases, minitab sets the adjusted coeff of det to zero
in multiple regression, what is the variance of E denoted by
Q sqaured
in multiple regression, what is the variance of E expectation
same for all values of the I.vs
What are the assumptions about E in simple or multiple linear regression
- the error term E - is a random variable with an expected value of 0
- The variance of E is the same for all values of the I.Vs
- The values of E are independent
- The Error term - is a normally distributed random variable
What type of graph is the mutliple regression
plane in 3-d space graph
What is the value of E in multiple regression
difference b/w actual y and the expected value of y E(y) when x1 = x* and x2 = X*2
Regression Analysis Terms
Dependent variable - we now use
Graph is called a
we now use response variable
graph is called a response surface
In Simple linear regression what tests did we use to test for significance
t and F test
- both provided the same conclusion
In multiple regression what tests do we use when testing for significance
F test and t test
in multiple regression what is the F test for
used to determine whether a significanct relationship exists b/w d.v and the set of i.vs
in multiple regression what is the F test referred to as
the test for Overall significance
in multiple regression what is the t test used for
used to determine whether EACH of the I.V.s is Significant
- a separate t test is conducted for EACH of the I.V. s
in multiple regression what is the t test referred to as
a test for individual significance
What is the Hypotheses for F test in multiple regression (test for overall significance)
HO: B1 = B2 = …….Bp = 0
Ha: one or more of the parameters is NOT equal to zero
In multiple regression, with the F test, if HO is reject what can we say
gives us sufficient statistical evidence to conclude that one or more of the parameters is NOT equal to zero and
that the overall relationship b/w y and the set of I.V s is significant
In multiple regression, with the F test, if we cannot reject Ho, what can we say
we do not have sufficient evidence to conclude that a significant relationship is present
in multiple regression, what is the formula for mean square
sum of squares / df
In multiple regression, what is the df for Total sum of squares
n-1
in multiple regression, what is the df for sum of squares due to regression (SSR)
p df
p - # of i.v?
in multiple regression, what is teh df for sum of squares due to error
SSE has n-p-1 df
in multiple regression, what is the df for MSR
MSR = SSR/p
in multiple regression, what is the df for MSE
MSE = SSE/n-p-1
In multiple regression, what does MSE provide
provides an unbiased estimate of Q squared (the variance of the error term E)