Stats Flashcards

1
Q

What is an index?

A

A change compared to a base value expressed in %.
Eg. Today $ / 2006 $ (x100) = 140
Equivalent to a 40% increase over 2006

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Compare/contrast percentage vs. Index

A

Percentages compare only to its own point of reference. Index standardize the point of reference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Absolute vs. Relative frequencies

A

Absolute is exact numbers. Relative is numbers within a range. Eg. 10-20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Line graph vs. Histogram

A

Line for relative frequencies
Bar centered on center point of range for frequency, touching other bars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

EQ: mean (what, how to calc)

A

Sum / # of entries

Gives average value

E+, then x,y key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

EQ: weighted arithmetic mean

A

Sum of (variable x weight) / sum of weights

Gives average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mean vs median

A

Mean higher than median means bulk of data is lower and there is a right side tail (high potential outliers)

Mean lower than median means bulk of data is higher and there is a left side tail (low potential outliers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is mode useful?

A

Non-numeric data. Eg. Fav colours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

EQ: Standard deviation

A

Looks like o with a tail

Value - mean, square it, sum them,divide by n, square root it

Gives how tightly clustered the values are. High = flat curve (loose)

Calc: sum values then ox, oy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard deviation intervals for a normal distribution

A

1 = 68.27%
2 = 95.45%
3 = 99.73%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

EQ: Variance

A

Sum of (value - mean)^2 / n
OR std dev squared

Not very useful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

EQ: coefficient of variation

A

SINGLE VARIABLE
= std dev / mean x 100
Shows how big the standard deviation is compared to the mean. Larger means more dispursed or more variable

BIVARIATE
=SEE / mean (dependent variable) x 100
tells us the size of the error as compared to the mean (%)

MULTIVARIATE
=same as bivariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When is the median preferable to the mean?

A
  1. Open-ended frequency group
  2. Extreme values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

EQ: Linear correlation coefficient (r)

A

Shows relationship between tow variables. Value from -1 to 1. Zero means no correlation. r > 0.8 strong, r < 0.4 weak.

BIVARIATE

Calc: x1 INPUT y1 [E+], [x^,r] ,SWAP

Cons: cannot describe non-linear relationships

Multivariate counterpart is coefficient of determination (R^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

EQ: Sum of least squares (S)

A

BIVARIATE

(Actual - predicted)^2, then sum

Univariate would be variance before dividing by n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to solve for bivariate linear equasion

A

x INPUT y [E+]
0 [y^,m] gives y-intercept
SWAP gives slope

17
Q

EQ: Standard error of the estimates (Syx) or (SEE)

A

BIVARIATE is Syx

Equasion 11.7
Suggests prediction error. Is in the same unit as the variable. Lower is better
MULTIVARIATE
also known as Root Mean Square Error
Analogous to standard deviation of regression errors, same % for normal distribution (68.27%, 95.45%, 99.79%). Shows how scattered. Low = close

18
Q

How do you turn a non-numerical variable into a number?

A

Turn each option into a yes(1)/no(0)

19
Q

What are the four “goodness of fit” variables?

A

Coefficient of determination (R^2)
Standard error of the estimate (SEE)
Coefficient of variation (COV)
F-value

20
Q

What are the two statistics that relate to the importance of individual variables?

A

Correlation coefficient (r)
t-statistic

21
Q

EQ: coefficient of determination (R^2) and adjusted R^2

A

MULTIVARIATE
=correlation coefficient (r) squared
Shows how well the regression model explains the variation in the dependent variable in %. 0 (low) to 1 (high).
Weaknesses:
1. can only go higher with more variables added. Goodness of fit could be overstated by adding many insignificant variables. (Corrected by using adjusted R^2)
2. Every model is different, so no benchmarks for fit.

22
Q

To improve multiple regression model, what should you look at?

A

First, SEE and COV, then R2

23
Q

What are strata and what it the effect on R2?

A

Strata are groups made before modeling. Then each group gets a model. Eg. Neighbourhoods.

R2 may be less because a large part of the variation is removed already by the strata. What is left to be modeled forms the new basis for R2

24
Q

EQ: f-value
(formula, meaning, benchmark, weakness)

A

= variance explained by regression divided by unexplained by regression

Is the model useful or no more useful than using the mean?

Tests whether model is NOT sufficient. F<4 = not significant
F>4=significant

Sensitive to number of variables/observations. High variables with low observations generally give f<4

25
Q

EQ: t-statistic

A

Confidence. How sure are we that the coefficient is NOT zero? Bigger=better.
Outside of +-2.58 = 99%
Outside of +-1.96 = 95%
Outside of +-1.64 = 90%
Significance level 0.10 means 90% confident it is not zero. 0.05 is reliable.

26
Q

Quick check list regression outputs:

A
  • Coefficients have expected signs
  • t-stat outside +-1.64, significance less than 0.10
  • f-value greater than 4
  • SEE approaching zero
  • COV less than 20%, under 10 ideal
  • adjusted R close to 1 (above 0.8)
27
Q

EQ: aggregate ratio

A

= sum of assessment / sum of sale
Susceptible to sampling error, high outliers.
(versus mean ASR which is sum of (assessment/sale) divided by n. and gives all observations equal weight.

28
Q

Explain percentiles and quartiles

A

Dividing points in a data set. Median is 50th percentile and 2nd quartile

29
Q

EQ: average absolute deviation (AAD)

A

=(ASR - median) sum absolute values, divide by n
Gives average spread similar to std dev but with median

30
Q

EQ: coefficient of dispersion
(Formula, benchmark, weakness)

A

=AAD / median
Makes comparable across groups
Good is 15 or less
Weakness: Cannot state probability of accuracy of a given assessment

31
Q

EQ: price related differential (PRD) (formula, interpretation, benchmark)

A

= mean ASR divided by aggregate ratio of ASR
Greater than 1.00 means high$ properties under appraised (regressive)
Less than 1.00 means progressive (overassessed)
Optimim: 0.98 <> 1.03

32
Q

IAAO standards (ASR, COD)

A
  • ASR 0.90 <> 1.10
  • Each stratum should be within 5% of overall for stratum
  • SF res should be COD 5 <> 15
  • ICI should be COD 5 <> 20
  • vacant land COD 5 <> 25
33
Q

Standard error of the mean (o-x)

A

O-x = o pop/sqrt n