Multiple Regression Flashcards
When do we use single variable linear regression?
to investigate the relationship between a dependent variable and one independent variable.
When do we use multiple regression?
to investigate the relationship between a dependent variable and multiple independent variables.
Gross relationship =
Gross Relationship: A single variable linear regression model determines the gross effect of an independent variable on a dependent variable. For example, the gross effect of house size on selling price is the average change in selling price when house size increases by one square foot. Since no other independent variables are included in the model, the coefficient for house size may pick up the effect of other factors related to selling price.
Net relationship =
Net Relationship: A multiple regression model determines the net effect of an independent variable on a dependent variable. The net effect controls for all other factors (independent variables) included in the regression model. For example, in a regression model including both distance and house size as independent variables, the coefficient for house size controls for distance. That is, the regression determines the average change in selling price if a house’s size increases by one square foot but its distance from Boston does not change. Coefficients in multiple regression are net with respect to variables included in the model and gross with respect to variables that are omitted from the model.
Forecasting in Excel
we can use Excel’s SUMPRODUCT function, =SUMPRODUCT(array1, [array2], [array3],…), to calculate a forecast from Excel’s regression output. The SUMPRODUCT function multiplies each value of the first array by the corresponding value in the second array and returns the sum of all those products.
Which model would we use to predict the price of a house that is 2,700 square feet?
SellingPrice=13,490.45+255.36(HouseSize)
Since we have data about just one independent variable, we should use a single variable regression model. This is a single variable linear regression model, in which house size is the only independent variable.
Use the single variable regression model with house size as the independent variable to predict the selling price of a house that is 2,700 square feet.
The expected selling price of a 2,700 square foot home is B2+B3*2700=$702,972.54. Intercept Coefficient + House Size Coefficient X Predicted Size.
Suppose we want to forecast selling price based on house size and distance from Boston. Which equation should we use to forecast the price of a house that is 2,700 square feet and 15 miles from Boston?
SellingPrice=194,986.59+244.54(HouseSize)–10,840.04(DistancefromBoston)
Since we have data about two independent variables, house size and distance from Boston, we should use the multiple regression model with those two variables.
ŷ =a+bx shows
The structure of the single variable linear regression line
A coefficient in a single variable linear regression characterizes …
the gross relationship between the dependent variable and the independent variables.
In single variable regression, to measure the predictive power of a single independent variable we use:
R2: the percentage of the variation in the dependent variable explained by the independent variable
In multiple regression, to measure the predictive power of a single independent variable we use
Adjusted R2
What insights do residual plots give to SINGLE vs MULTIPLE regression?
the residual plot for the single variable regression gives us insight into the gross relationship
the residual plot for multiple regression gives insight into the net relationship
How do we Test for Significance of Variables?
We should also analyze the p-values of the independent variables to determine whether there is a significant relationship between the variables in the model. If the p-value of each of the independent variables is less than 0.05, we conclude that there is sufficient evidence to say that we are 95% confident that there is a significant linear relationship between the dependent and independent variables.
What are residuals?
The residuals are the difference between the historically observed values and the values predicted by the regression model.