Lesson 3 Flashcards

0
Q

You want to “smooth” the data relationship in a scatter plot. How might you do this?

A

A best fit trend line is added to the data with an automated least squares regression which takes a data array with wide variability and “smooths” the result to lose clearly and quickly understand the relationship between the variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What would be the best tool for analyzing the relationship between two continuos variables?

A

Scatter plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If the correlation between two variables is strong, is the relationship deemed to be linear?

A

Not always. Two variables may have high correlation but exhibit a nonlinear relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can the coefficient of variation be used to determine appropriate units of comparison for real estate data?

A

COV = Standard deviation / mean

In combination of various independent and dependent variables the strong relationship between variables would be the ones with the lowest COV, indicating that those variables should be further tested.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You hire a consultant to complete a statistical analysis predicting the need for seniors housing in Langley. In reviewing the results, should you focus on the reliability of the forecast in relation to other benchmark data? Or do you need to examine the consultants interpretation of the underlying data relationships?

A

Both approaches will be necessary. Data exploration would be necessary to determine the strengths and weaknesses of the statistical analysis. The results may appear reasonable, but end up not adequately supported by the underlying data and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How could you use visual presentation aids to help a client understand a statistical analysis?

A

Visual aids such as graphs, provide the opportunity to simplify complex relationships between data variables so that the key messages about the data become clear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Real estate terminology is very specialized. A variable describing office building class would be what type of data variable? What possible problems might you experience in relying on this building class variable?

A

The Building Class variable is an example of an “Ordinal” variable. Each class is related to the other and provides an indication of which class is “better” than another, but not any objective indication of how much “better” one class is in relation to another. The problem with ordinal data variables is that they are often based on subjective interpretation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Do you agree with the following statement: “The goal of exploratory data analysis is to identify and account for every source of variation in data relationships?”

A

It is often impossible to account for every source variation in the relationship between two or more variables. Procedures have been developed in statistical analysis to identify various sources of variation in the relationship between data variables, but there will usually be some random variation that cannot be explained. Data occurrences that do not follow established data relationships are often described as “outliers”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

You are looking at a histogram with a normal distribution. If some data was removed from. The dataset, resulting in the median being lower than the mean, how do you think the new histogram would look?

A

The histogram should be skewed to the right, meaning the data is clumped on the left with a long tail extending right of the median and mean. The extent of the “skewness” would depend on the amount of data removed and the resulting impact on the median and mean. The importance of this point is that the mean, by itself, is not a complete measure of central tendency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

You have a database of recreational lot sales and are forecasting sale price per front foot for a certain size of waterfront lot. How can you account for non-linear data relationships in your forecast?

A

First, identify the nonlinear data relationships and appropriate units of comparison using data exploration and revelation (graphical analysis). Then convert the data to a linear relationships using logarithms. The data can then be re-tested to determine the strength of the relationship between logarithmic variables. The coefficient for the regression equation (independent variable) represents the exponential relationship between the two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Consider three sample datasets drawn from the same population of high-rose condo sales in Vancouver. The datasets include a number of variables: sale price, unit size, floor height, view, and parking. The descriptive statistics for each dataset indicates similar mean and median values for sale price per front foot. What following steps should you take in comparing and analyzing datasets? What should you be attempting to uncover?

A

Look to see if the data is indeed comparable or if they are desperate markets. Start by graphing each variable to understand its distribution. It is important because the number of statistical measures are based on the assumption that data follows a normal distribution. Outliers and other anomalies will become apparent through scatter plots and histograms. Recoding data and re plotting in a box plot will illustrate patterns in data and the strength of various relationships between data variables. This initial data exploration is critical if later tasks in model building are to be successful. You may be able to combine the datasets if they combine the datasets if the appear to be in the same market.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If two data variables, say, price per square foot and finished floor space in new housing, had a linear relationship, what would be an easy way of determining the linear regression equation?

A

Create a scatter plot of the dependent (price per square foot) and independent variables (finished floor space in square feet). From the scatter plot measure the slope of the regression line, and estimate the projection of the intersection of the line on the Y-axis where X=0 . This point will be the constant in the equation. The slope will be the regression coefficient. If the regression represents an inverse relationship, the coefficient will be negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If the data doesn’t have a linear relationship and strong correlation, what would be the risk of relying on a regression equation where most of the data occupancies were concentrated at one end of the regression with few occupancies at the other end?

A

The slope of the regression line would be very sensitive to the location of a few data occurrences because of the nature of the least squares calculations. Aka the coefficient of the independent variable could be dramatically affected by one or two data points, resulting in predications of the dependent variable which contain high potential error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Assume you are comparing two datasets for apartment rents in Victoria. One dataset reflects 3-story “walk up” apartment rents in the James Bay community and the other dataset includes similar property rents in another Victoria community, Fernwood. You want to determine the effect of location on rent. Assuming both datasets have a similar structure, what approach would you use to compare the datasets?

A

Step 1- explore the descriptive statistics for each data set by examining the correlation for various combinations of dependent and independent variable. Can the strength of relationships be clarified by recoding or transforming data variable? Use graphical tools to tear the two means, assuming the data in both datasets has similar distribution, will provide a measure of the locational difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When exploring data for the first time, preliminary screening is important so you can:

1) seek patterns in the data
2) understand relationships within the data
3) eliminate data you do not need or identify data that seems odd or impossible
4) all of the above

A

(4) all of the above are key tasks for the initial phase of data exploration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

You have created a model to estate office vacancies over time. You notice that the error from your prediction does not have constant variance and you believe it shows heteroskedasicity. This analysis would be an example of which of the “Four Rs”?

1) reduction
2) revaluation
3) re-expression
4) residuals

A

4) looking at errors in the model is an example of residual analysis

16
Q

How would you describe the “bedrooms” variable type?

1) nominal variable scale with continuos characteristics
2) interval variable scale with discrete characteristics
3) ordinal variable scale with subjective characteristics
4) ratio variable scale with continuous characteristics

A

2) bedrooms data variable is an interval variable since the values have no relation of order. They have a discrete characteristic since the variable values may only consist of a fixed number and any two observations may not have an observation between them. In other words, it is a variable that can only assume values in a discrete set.

17
Q

Key goals of exploratory data analysis are to simplify the complexity of data available for analysis and understand the distribution and variance of the data. How would you describe the relationship between the quality and sale price variables?

1) The variables are stochastic in nature since quality is an excellent predictor of sale price.
2) the variables are deterministic in nature
3) a scatter plot of the variables reveals the data has a smooth relationship
4) the variables are stochastic in nature since quality does not explain all the variation on sale price.

A

4) stochastic relationships are those in which a random element precludes the full explanation of all variation.

18
Q

Assume you have begun studying the relationship between the test value and effective year built. You are only interested in carrying out further research if at least 70% of variance in the test value can be explained by effective year built. If a scatter plot showed a relationship of R2 value of 0.567 should further research be conducted?

A

No the R2 variable indicates only a 56.4% relationship

19
Q

You want to determine which has less variation: sale price per square foot of finished area with a mean centered COV of 19% or sale price per square foot of lot size with a COV of price per square foot of lot area 36%?

A

Price per square foot is finished area since the COV is approximately have the COV for lot area.

20
Q

You are involved in an assessment appeal for a property that has a fabulous view, for which the owner feels she has been over-assessed. You have analyzed the sale of all view properties in this area over the past year and coded the quality of each as Excellent, Good, Fair or Poor. However you find the variable’s current format means it cannot was lofty be used in further analysis in your statistic software problem. In order to make this variable more useful, which of the “Four Rs” is necessary?

A

Re-expression. This variable needs to be transformed such that it can be used in statistical analysis.

21
Q

Your analysis of two area variables confirms that the COV and R2 indicate similar statistical outcomes, but you are concerned about possible no -linearity. How would you decide which variable to select for further analysis?

A

Run scatter plots for each unit of comparison, evaluate the result, and co duct logarithmic testing to determine which unit or comparison best accounts for variation in the dependent variable.

22
Q

A colleague in your consulting firm has quit and you inherited his files. He was in the middle of large scale market study for retail development sites and the data is a mess. You’re not sure what’s relevant and you defiantly can’t see any patterns or make any conclusions. Which of the following is a recommended exploratory data analysis technique?

1) reduce the uncertainty by organizing the data into a database and eliminating unneeded cases and variables
2) run summary statistics to better understand the range in the data and the averages
3) create scatter plots and box plots to get a sense of the relationship between variables
4) all of the above

A

4) 1 & 3 offer viable strategies for exploratory data analysis. These will help you to better understand the data you have and also begin to find patterns that can help solve your real estate problem.

23
Q

Assume you completed a no -linear analysis of rents per front foot versus store frontage for retail properties on Robson Street, an exclusive shopping precinct in downtown Vancouver. The COV analysis indicates that a logarithmic regression accounts for most of the variation in rents. However a significant variation remains - you suspect it is related to excess retail store depth. What could you do to verify your hypothesis?

1) nothing. Problem cannot be solved
2) set a filter to eliminate newer properties for the analysis
3) re-express rent per front foot as rent per deep foot
4) re-code depth into a new variable which can be graphed in relation to size adjusted rents to see if a significant relationship

A

4) a number of steps should be taken to verify the additional impact of “excess depth” on retail rents per front foot. Once a nonlinear relationship has been confirmed, the next step is to convert all data to size adjusted rents per front foot to isolate this variable using the regression coefficient . Then, the remaining variation can be explored by comparing these adjusted rents to the associated store depth to see whether this variable accounts for the remaining variation.

24
Q

What would be the best statistic to help answer the following question: If a person purchases a 1,000 square foot high rise condo in downtown Toronto, how likely are they to purchase a foot-screen TV?

1) coefficient of variation since the statistic explains the variation of one variable in relation to another
2) standard deviation since it is a measure of dispersion for both variables
3) correlation coefficient since it measures the degree to which the values of two variables are proportional to each other
4) mean since it explains the central tendencies within the data

A

3) the correlation coefficient is commonly used to examine the relationship between the distribution of two different variables for the same sample. It is a measure of the extent to which the values of two variables are proportional to each other. This is commonly used for market research. The COD measures the amount of common variation between two variables. The COV and standard deviation are both measures of dispersion, which are not optimal to explain this relationship. The mean is a measure of central tenancy which is not optimal to explain this relationship

25
Q

What are ordinal variables?

A

Variables that are ranked

First second and third place

26
Q

What are nominal variables and give an example.

A

Variables with no relationships between the them.

Type of construction: wood, steel, concrete

27
Q

What are interval variables and give an example.

A

Variables with a relationship between them

Distance. One is twice as far as the other

28
Q

What are ratio variables? Give an example

A

Variables with a relationship.

A warehouse is twice as big as the other

29
Q

What is a qualitative variable? Give an example

A

A variable that describes quality

Excellent landscaping

30
Q

What is a quantitative variable? Give an example

A

A variable that can be counted or measured

3 bathroom

31
Q

What is a subjective variable? Give and example

A

A variable based on an opinion

Average view

32
Q

What is an objective variable? Give an example

A

It’s a variable based on fact

A brick wall

33
Q

What is a continuous variable? Give an example

A

Given any two observations a valid observation could be found between the two

House sizes

34
Q

What s a discrete variable? Give an example

A

Given two observations there may not be a valid observation between them
Number of bedrooms
Type of construction

35
Q

What are binary variables? Give an example

A

Discrete variables used for non numeric variables which only have two possibilities

1=y 0=N

36
Q

Name the “four Rs” and explain them

A

Reduction - simplify the information using descriptive statistics and graphs

Revelation - relationships between variables using box plots, scatter plots, correlation analysis, cross tabs

Re-expression - data transformation

Residuals - data testing