Sampling and Estimation Flashcards

1
Q

Sampling error formula

A

sampling error of the mean = ^x - mue

ie sample mean - population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Systematic sampling

A
  • select every kth member of the population until we have a sample of the desired size
  • divide the entire population by the desired sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Stratified Random Sampling

A
  • population is devided into subgroups based on one or more distinguishing characteristics.
  • samples are then drawn from each subgroup, with their sample size proportional to the size of the subgroup relative to the population
  • the sample will have the same distribution of key characteristics as the overall population
  • more precise than simple random sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cluster sampling

A
  • similar to stratified random sampling, but requires the population to be divided into subpopulations, called clusters
  • each cluster is essentially a mini-representation of the entire population
  • for voting districts or school districts, market surveys
  • less accurate but more time and cost-efficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Standard error

A

SE = population (sample) stdv / sq root of n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The three desirable properties of an estimator are:

A
  • unbiasedness
  • efficiency
  • consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Unbiasedness

A
  • its expected value is equal to the parameter being estimated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Efficiency

A
  • has the lowest variance as compared to other unbiased estimators of the same parameter
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Consistency

A
  • as sample size increases, the sampling error decreases and the estimates get closer to the actual value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Z score and “reliability factor” for:
90% CI
95% CI
99% CI

A

Z “reliability factor”
90% CI .05 1.65
95% CI .025 1.96
99% CI .005 2.58

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Confidence Interval formula

A

CI = point estimate +- (reliability factor * std error)

point estimate ie mean

CI = ^x +- (Z a/2 * (pop std / sqrt n))

Z a/2: 1.65, 1.96, 2.58

CI = ^x +- (t a/2 * (sample std / sqrt n))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Significance level:

A

a (alpha)

a = .1 =
90% CI, a/2 = .05, reliability factor z=.05 = 1.65

a = .05
95% CI, a/2 = .025, reliability factor z=.025 = 1.96

a = .01
99% CI, a/2 = .005, reliability factor z=.005 = 2.58

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Significance level:

A

a (alpha)

a = .1 =
90% CI, a/2 = .05, reliability factor z=.05 = 1.65

a = .05
95% CI, a/2 = .025, reliability factor z=.025 = 1.96

a = .01
99% CI, a/2 = .005, reliability factor z=.005 = 2.58

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
Which distribution (z or t) is used for:
- For a normal distribution, variance is known,     n <30
- For a normal distribution, variance is known, 
n >= 30
- For a normal distribution, variance is unknown, n < 30
- For a normal distribution, variance is unknown, n >= 30
A
  • For a normal distribution, variance is known, n <30
    z distribution
  • For a normal distribution, variance is known,
    n >= 30
    z distribution
  • For a normal distribution, variance is unknown, n < 30
    t distribution
  • For a normal distribution, variance is unknown, n >= 30
    t or z distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which distribution (z or t) is used for:

  • For a non-normal distribution, variance is known, n < 30
  • For a non-normal distribution, variance is known, n >= 30
  • For a non-normal distribution, variance is unknown, n < 30
  • For a non-normal distribution, variance is known, n >= 30
A
  • For a non-normal distribution, variance is known, n < 30
    NA
  • For a non-normal distribution, variance is known, n >= 30
    z distribution
  • For a non-normal distribution, variance is unknown, n < 30
    NA
  • For a non-normal distribution, variance is known, n >= 30
    t or z distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A larger sample size will…

A
  • decrease the CI width and improves precision
17
Q

When sample size (n) increases…

A
  • the standard error (s/squrt n) decreases
18
Q

Bootstrap

A
  • treat the randomly drawn sample as if it were the actually the population
  • take smaller samples from the larger sample one at a time and return them to the data, “sampling with replacement”
  • the researcher has to decide how many repetitions are appropriate
19
Q

Jackknife

A
  • start with a sample of data
  • subsequent samples are created by leaving out one observation at a time from the set, no replacement
  • for a sample size of n, jackknife usually requires n repetitions
  • used to reduce the vias of an estimator
  • will produce similar results with every run
20
Q

List the sampling biases

A
  • data-snooping
  • sample selection
  • look-ahead
  • time period
21
Q

Data-snooping Bias

A
  • practice of analyzing the same data again and again, till a pattern that works is identified
  • signs:
    too much digging/too little confidence
    no story/no future
22
Q

Sample Selection Bias

A
  • survivorship bias

- backfill bias

23
Q

Look-ahead Bias

A
  • if a test uses information that was not available to market participants at the time the market participants act in the model
  • ie using year-end P/B values if the decision would have taken place in Q2 of that fiscal year
24
Q

Time-period Bias

A
  • could use a time series that is too short, or too long