Lesson 6 Flashcards
A regression equation contains a constant $17,918. What is another way of expressing this constant?
1) minimum condominium price
2) If the regression line is graphed, it will intercept the X-axis at square feet = 17,918
3) The mean difference between the regression line and all observations
4) if the regression line is graphed, it will intercept the Y -axis at $17,918
4)
The constant in the regression equation is the number at which the regression line intercepts the Y-axis
Could still use the same regression equation to predict a sale price by using an apartment size which is larger than the size range in the data set?
No, it’s too risky to extrapolate relationships beyond the dataset since the relationships may no longer be linear or the linearity may change.
Consider a model where the dependent variable is sale price and the independent variable is age of building. This resulted in the following regression equation: Y=100,500-900X and R2 of 0.8. What can you conclude about these results?
1) each year adds $960 to value
2) weak correlation with 64% of the variation in sale price explained by building age
3) strong negative correlation with 80% of the variation in sale price explained by building age
4) A 1 year old building is worth $101,460
3)
The negative sign of the regression coefficient indicates negative correlation. The R2 at 0.8 indicated a strong correlation. Each year of age reduces value by $960.
What is the advantage of multiple regression over simple regression?
1) helps deal with non-linear relationships
2) provides the analyst an opportunity to account for additional sources of predictive error
3) accounts for the economic reality that many variables may affect the dependent variable
4) multicollinearity becomes increasingly possible
3)
Simple regression only considers one independent variable. However in reality many independent variables affect the dependent variable
Using the equation Sale Price = $5054.05 + ($67.04x total living area) + ($923.31x floor#) + ($9857.91 x baths)
How many baths would a 920 sqft condo on the 15th floor have if it sold for $105,000?
2.5 baths
What would be your reaction regarding a regression statistic which had a bathroom variable with a t-statistic of 1.05 and all other statistics for the remaining variables were unchanged?
Since the t-statistic is under the critical value of 2, we can no longer be confident that the variable coefficient is different than 0. This means we are not confident its value is correct or is the variable can be removed without affecting the model
Consider a dataset with 4 variables: rent, gross rentable area, usable area, and floor level. A regression equation has been developed to predict rent using the other three independent variables. The R2 value for the relationship of two independent variables, gross rentable are versus useable area is 0.832. Would you rely on this model?
1) No, the model is suspect since it does not contain multicollinearity
2) no, since usable area is poorly correlated with rent
3) yes since the R2 is quite night
4) No, variables which demonstrate multicollinearity should not be placed in the same model
4)
It would be necessary to exclude one d re variables and retest the novel to determine if the multicollinearity has been removed
Consider the unique scenario in which the sale price of single family detached homes is predicted using four variables: total finished area, lot size, #fireplaces, #bathrooms. Multiple regression analysis can be used to determine the coefficients for each independent variable. Which of the following statements are TRUE?
1) the independent variable with the largest coefficient will always have the greatest effect on sale price
2) the independent variable with the smallest coefficient will always have the least effect on sale price
3) the effect of an independent variables coefficient depends on its size, but also on the nature of the variable and its unit of measurement
4) total finished area will always have the greatest effect on sale price
3)
After running a regression to find that the model yields a SEE of 5,000. Is this a good fit? What are the problems with using SEE as a measure of “goodness of fit”?
SEE is an absolute measure, meaning size alone doesn’t tell much. In order to use SEE to analyze “goodness of fit”, we must convert it to a COV, by dividing the SEE by the mean. The COV tells us how well our model is doing In relative terms. A great target COV for a good model is less than 10%.
Explain the difference between simple linear regression and multiple regression.
Multiple regression
- includes 2 or more variables
- difficult to depict spatially since it involves three or more dimensions
- involved much more complexity of calculations than linear regression, which results in more robust outcomes but is more difficult for clients and other real estate professionals to understand.
Simple linear regression
- includes only one variable
- is in two dimensional and can be readily displayed in a graph
The first step in testing for multicollinearity is conducted during data-screening where the correlation of each of the independent variables is determined. What other steps can be taken to ensure that multicollinearity is not present in your model?
After creating the model, check to make sure that no tolerance values are less than 0.3 and no VIF values are greater than 3.33