Module 5 Flashcards
We use ___________ to investigate the relationship between a dependent variable and multiple
independent variables.
multiple regression
The structure of the multiple regression equation is…
y^=a+b1x1+b2x2+…+bkxk.
The true relationship between multiple variables is described by y=α+β1x1+β2x2+…+βkxk+ε, where ε is the ________.
error term
_________ in multiple regression characterize relationships that are net with respect to the independent variables included in the model but gross with respect to all omitted independent variables.
Coefficients
Forecasting with a multiple regression equation is similar to forecasting with a single variable linear model. However, instead of entering only one value for a single independent variable, we input a value for ______ of the independent variables.
each
As with single variable linear regression, it is important to evaluate several metrics to determine whether a multiple variable linear regression model is a ______ for our data.
good fit
For multiple regression we rely less on scatter plots and more on __________ and _________ because visualizing three or more variables can be difficult.
numerical values; residual plots
Because R2 never ________ when independent variables are added to a regression, it is important to multiply it by
an ____________ when assessing and comparing the fit of a multiple regression model.
decreases; adjustment factor
This adjustment factor compensates for the ________ in R2 that results solely from increasing the number of independent variables.
increase
It is particularly important to look at ________, rather than R2, when comparing regression models with different numbers of independent variables.
Adjusted R2
In addition to analyzing Adjusted R2, we must test whether the relationship between the independent and dependent variables is ______ and _______. We do this by analyzing the regression’s residual plots and the _______
associated with each independent variable’s coefficient.
linear; significant; p-values
For multiple regression models, because it is difficult to view the data in a simple scatter plot, ________ are an
indispensable tool for detecting whether the linear model is a good fit.
residual plots
There is a residual plot for each _______ variable included in the regression model.
independent
We can graph a residual plot for each independent variable to help detect patterns such as __________ and __________.
heteroskedasticity; nonlinearity
As with single variable regression models, if the underlying multiple relationship is linear, each of the residuals
follows a normal distribution with a mean of _____ and _____ variance.
zero; fixed
We should also analyze the _______ of the independent variables to determine whether there is a significant
relationship between the variables in the model.
p-values
If the p-value of each of the independent variables is less than _____, we conclude that there is sufficient evidence to say that we are 95% confident that there is a significant linear relationship between the independent and dependent variables.
0.05
Multiple regression requires us to be aware of the possibility of __________ among the independent variables.
multicollinearity
Multicollinearity occurs when there is a ______ linear relationship among two or more of the independent
variables.
strong
Indications of multicollinearity include seeing an independent variable’s p-value ________ when one or more other independent variables are added to a regression model.
increase
We may be able to _______ multicollinearity by either increasing the ________ size or removing one (or more) of
the collinear variables.
reduce; sample
Multiple regression models allow us to include multiple __________ for categorical data—day of week,
for example.
dummy variables
A dummy variable is equal to ___ when the variable of interest fits a certain criterion. For example, a dummy
variable for “Saturday” would equal ___ for observations relating to Saturdays and ___ for observations related
to all other days.
1; 1; 0
The number of dummy variables we include must always be ___ fewer than the number of options in a category.
1
_________ are used to capture the ongoing effects of a given variable.
Lagged values
The lag period is based on managerial _____ and data _______.
insight; availability
If the lagged variable does not _______ the model’s explanatory power, the addition of the variable
decreases Adjusted R2.
increase
How do you create a regression output table in excel?
Using the Data Analysis tool
How do you create a regression model using dummy variables?
=IF(logical_test,[value_if_true],[value_if_false])
→ Returns value_if_true if the specified condition is met, and returns value_if_false if the condition is not met.