QM PREREQ5 Sampling and estimation Flashcards

Question 1

Q

What is an estimator?

Answer

A

A formula used to estimate a statistic (ie variance)

Question 2

Q

What are the desirable properties for estimators?

Answer

A

Unbiasedness: an unbiased estimator is one whose expected value (the mean of its sampling distribution) equals the parameter it is intended to estimate

An unbiased estimator would be one where xbar = sum of xsubi / n
xbar = sum of xsubi / (n-1) would be biased upwards because it would increase the estimate of the mean upwards by 1

Efficiency: an unbiased estimator is efficient if no other unbiased estimator has a sampling distribution with smaller variance

A more efficient estimator will have a taller head and thinner tails (even though both are unbiased

Consistency: a consistent estimator is one for which the probability of estimates close to the value of the population parameter increase as the sample size increases

For example, if our estimation of Standard Error was SE = S/sqrt(n)
this would be a consistent estimator. Because as n increases standard error should decrease

Question 3

Q

What is a confidence interval?

Answer

A

A range for which one can assert with a given probability (1-alpha), called the degree of confidence, that it will contain the parameter it is intended to estimate

I.e., lower limit <- xbar -> upper limit
This is a two sided confidence interval

Question 4

Q

What is a point estimate?

Answer

A

An estimate for what a parameter is

Question 5

Q

What are the two interpretations of a confidence interval?

Answer

A

Probabilistic: in repeated sampling, 95% (for example) of such CIs will in the long run include or bracket the population mean
Practical: 95% confident that a given CI contains the population mean

Question 6

Q

How do we construct a CI?

Answer

A

Take the point estimate (xbar)
Add or substract the reliability factor, multiplied by the standard error

The reliability factor can be based on a z value or a t value
The standard error is sigma / sqrt(n) or s / sqrt(n) if you only have sample variance

If you multiply reliability factor x standard error by 2 you get the confidence interval, as it is plus minus

Question 7

Q

What are the most common reliability factors?

Answer

A

90% confidence interval: 1.65 rf
95%: 1.96
99%: 2.58

Question 8

Q

Do we use z or t to find our confidence interval if we have a large sample with variance unknown?

Answer

A

z, because as sample size increases t increases
i.e., if n=400 we would just use z
The reading tends to say over n=30 we would stop using t, but over 200 or 300 is where they converge. A “large sample size” is not really 50.
You can never be WRONG when using the t value because of the convergence

Question 9

Q

How do we find t-value in excel?

Answer

A

=T.INV(probability, degrees of freedom)
gives you the t value or the negative t value

Question 10

Q

Under what conditions would we use the z value?

Question 11

Q

How do we determine what sample size will be required to obtain a confidence interval of 1% can be created?

Answer

A

Let’s call this E:
xbar +/- ( t x s/sqrt(n) )

The width of the confidence interval will be 2E
Thus we can rearrange to:

n = [ (t x s) / E]^2

We would not expect standard deviation for the sample to change as n changes, but we would expect standard error to change.

Question 12

Q

What is a data snooping bias?

Answer

A

The bias of searching a data set for statistical patterns or relationships. This is also known as data mining.

If alpha = 5%, testing 100 different variables, on average, will produce 5 significant relationships

Data snooping is typically not theory-driven, and lacks an economic rationale behind it.

Question 13

Q

How do we minimise or avoid data snooping bias?

Answer

A

To combat data snooping bias we must have a clear, well-formulated hypothesis. It must have an economic rationale and accompanying theory behind it.
We split our data set into a training data set, a validation data set, and test data.
- The training data is used to build and fit a model
- The validation data set is sed to fit and tune the model.
- The test data is used as an out-of-sample test to evaluate model fit. If data snooping is present, there will be insignificant model fit!

Question 14

Q

What is sample selection bias?

Answer

A

Excluding some observations or time periods (basically choosing non-random samples)

i.e., survivorship bias: historical data may only include data for companies that survived
This would overstate the performance.

Another example would be using hedge fund indexes. Since they self-report, only well-performing funds may opt to report.

Question 15

Q

What is look ahead bias?

Answer

A

Using information that was not available on the observation date.
I.e., models that use price and accounting data from the historical record, when the accounting data may not have been available on the same date.

For example, we can observe the price on Dec 31st, and book value on Dec 31st, but in fact BV may not have been reported until mid February. Linking BV and price on Dec 31st would be look ahead bias.

Question 16

Q

What is time period bias?

Answer

Study These Flashcards

A

Results in one time period may be specific to that time period.
Time period bias is typical of SHORT time series
However, time series that are too long risk including more than one regime or distribution

QM PREREQ5 Sampling and estimation Flashcards

(16 cards)