Sampling and Estimation Flashcards

1
Q

Simple random sampling

A

Process of selecting sample where each member of population has equal chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sampling distribution

A

Distribution of all distinct possible values that statistic can assume when computed from samples of same size drawn randomly from same population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Simple random vs. stratified random sampling

A

Stratified involves dividing population into subpopulations based on certain criteria, then using simple random sampling in each stratum. Allows making sure that certain populations are represented in sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Time series vs. cross sectional data

A

Sequence of information over intervals of time vs. data on some characteristic at set point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Central limit theorem and its importance

A

Given population with mean μ and finite variance σ^2, sampling distribution of sample mean computed from samples of n size will be approximately normal with mean μ and variance of σ^2/n when sample size is greater than 30.

Allows making precise probability statements about POPULATION mean by using sample mean, regardless of distribution of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Calculate and interpret standard error of sample mean

A

standard deviation (pop or samp) / square root of n

It is the standard deviation of sampling distribution of sample mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Desirable properties of estimator

A

Unbiased - expected value equals parameter intended to estimate

Efficient - no other unbiased estimator of same parameter has smaller variance of sampling distribution

Consistent - probability of estimates close to value of population parameter increases as sample size increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Point estimate vs. confidence interval estimate of population parameter

A

Point estimate - single number used to estimate parameter

Confidence interval estimate - range of values that brackets population parameter with probability 1 - α (degree of confidence) that it will contain the parameter [100(1-α)%]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe properties of Student’s t-distribution

A

Symmetrical probability distribution defined by single parameter - degree of freedom

Can use to construct confidence intervals for population mean when population variance is UNKNOWN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

calculate and interpret degrees of freedom for t-distribution

A

n-1

Number of degrees of freedom in estimating population variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Calculate and interpret confidence interval for population mean with normal distribution and known population variance

A

sample mean +- z (sub α/2) * (σ/square root of n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Calculate and interpret confidence interval for population mean with normal distribution and unknown population variance

A

Only use if sample large or population normally distributed

sample mean +- t (sub α/2) * (s/square root of n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Calculate and interpret confidence interval for population mean with normal distribution and unknown variance and large sample size

A

Only use if sample large or population normally distributed

sample mean +- t (sub α/2) * (s/square root of n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Appropriate sample size

A

Look at need for precision, risk of sampling from more than one population, expenses of different sample sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data mining bias

A

Errors arising from misuse of data. Drilling until finding something that works. Frequently these will fail in future because they are after-the-fact.

Watch for:

Too much digging, too little confidence
No story, no future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sample selection bias

A

Data availability leads to certain assets being excluded from analysis

17
Q

Survivorship bias

A

Funds, companies etc. no longer appear in sample because they went out of business

18
Q

Look-ahead bias

A

Using information not available on test date (e.g. looking at 12/31 price in light of year-end numbers that are not available on 12/31)

19
Q

Time-period bias

A

Using time period that may make results time-period specific

20
Q

Reliability factors for confidence intervals in t and z tests

A

Use correct column in tables for reliability factor:

90% - .05
95% - .025
99% - .005

More confidence you want, the less accurate you can be