Lesson 7 Flashcards
Explain Explanatory Model.
What is a good predictive model?
Developed primarily to explain the value that each variable contributes to market value
The value per square foot of living area
- rather than focussing on the outcome of the model overall, this type of model focuses on developing the most accurate possible values for the coefficient
A good predictive model can be used to directly estimate sale prices. The R2 should be high and SEE should be small. The t-statistics or significance levels for variables is not important. It leads to a model with the lowest overall error possible but it does not necessarily produce relatable individual variable coefficients.
Describe a predictive model.
Developed to produce the highest quality overall prediction of market vale
Archive the best possible estimate of selling price, but not necessarily the most reliable estimates for the individual coefficients
A good exploring model the variable coefficients could be used as adjustments in a direct comparison approach it as market-derived cost approach.
What are the model building steps
Step 0: preliminary data screening
- duplicate sales, outliers, missing data
Step 2: Reviewing the variables
- where do they fit in the model; which are land and which are building
Step 3: Examining Variables
- data decisions,
- time adjustments
- what variables explain the dependent variable
- K-W Test
Step 4: Transformations
Step 5: Examining the Transformed variables
- make sure they are done properly
- visual, box plots, scatter plots
Step 6: List the Variables for Calibration
- 2 sets min of possible models
- run enter regression
- eliminate multicollinearity by removing variables with VIF over 3.33 (one at a time)
- split the data base into model and test
- run stepwise regression model
Step 7: Model Calibration
- run enter regression with the model variables
- remove records with large residuals
- rerun model to ensure t statistic are ok
- remove outlier - rerun model until there are no more produced
Step 8: Test and Evaluate the Model
- K-W
- ratio statistics - for the adjustment to be made
- repeat ratio statistics after every adjustment
Step 9: State Conclusions on Model Quality
How do you calculate the COV?
COV = standard deviation / mean
How is model specification different from model calibration?
Specification involves testing and refining the model, while calibration involves attaching variables to a model and solving for the coefficients
The focus on specification is understanding data relationships and selecting variables for modeling, while the focus of calibration is building and testing the model.
Your valuation assignment is to develop a regression model to predict model to predict property values for August this year. You will use a sample data set of residential single family property sales in Scarborough over a 12 month period ending in October this year. Why might time adjusting theses sales be important?
1) to confirm that no time adjustment is necessary
2) to remove the variance in sale price accounted for by changes in the sale date
3) to ensure that market movement is not accounted for in some other variable
4) both 2 and 3
4
In addition to the correlation statistic, what additional measures would you consider to determine if a variable should be included in a multiple regression model?
1) t-statistic higher than 1.6
2) drop the R2 and increase in SEE
3) sig greater than 1
4) all of the above
1)
T-statistic higher than 1.6 is desirable for an independent variable
In your role as a resort marketing team member, you need to predict the selling price of ski resort condos. You plan the build a regression equation based on similar recent sales at the resort. Others on your team have suggested what you believe to be excessive number of variables for the model. What strategy might you use to demonstrate the regression analysis can be simplified?
1) refer the team members to the correlation matrix and highlight the variables which have little correlation with the dependent variable, sale price
2) develop scatter-plots of each dependent and independent variable to illustrate relationships and strength of the R2 value
3) use stepwise regression to demonstrate the impact of adding each new variable
4) all of the above
4
Your multiple regression model results show a large F value but a low R2 value. What can you conclude about this result?
That the regression is significant but the variables only explain a small amount of the variation in the dependent variable. The F statistic measures performance of the model overall when compared to the result that would be obtained by estimating the sale price by simply using the mean sale price. With a high F, the significance will be low or 0, meaning the result is significant. But the low R2 value shows that the model is not explaining as much of the variation in sale price as would be optimal.
For a mass appraisal model, what is the importance of the variable coefficients? How can you explain these coefficients in valuation terms?
The coefficient in regression models are amounts assigned to each variable.
In valuation terms these are equivalent to adjustment factors which account for the impact of specific variables, such as additional plumbing, view. The accuracy of the coefficient will depend on how well each step in the model building process is completed
To include an ordinal variable for property characteristics in a regression model, you may transform the variables into separate binary variables. What is a disadvantage of using binary variables versus another re-coding approach?
Transformation of ordinal variables into binary variables means that the cakes for each variable are limited to two discrete numbers. Therefore a binary variable is required for each characteristic to be studied, with the database possibly becoming large and complete. A way to overcome this problem is to transform ordinal variables into a single variable and use numbers to represent the different qualities for view. Rather than 3 separate codes for different views it would be possible to have one view with three possible views.
Assume you need to develop a regression model to explain the impact of view on high-rise condo sales in Burnaby. What type of model would you develop, predictive or exploratory? Why?
An exploratory model would be preferred since the aim of this model is to accurately explain the degree to which each variable contributes to the dependent variable. In this model, the accuracy of the coefficient for each variable are as important as the outcome of the overall regression. Different statistical measures are important for an exploratory model versus a predictive model.
Alex has purchased a property sales data-set from and assessment organization to support his real estate appraisal business. The data includes information on a large number of data variables are not very helpful in building a regression model. How could Alex possibly arrive at this conclusion?
1) data exploration
- descriptive statistics to find variables with few occurrences
- crosstabs for frequency of housing stock it sub-neighborhoods
- graphical analysis for relationships between variables as well as outliers
- correlation matrix
Variables with few occurrences or weak correlation can likely be excluded from further consideration