Sampling and Estimation Flashcards

1
Q

Compare and contrast probability samples with non-probability samples and discuss applications of each to an investment problem.

A

The probability sampling refers to sampling methods based on randomly chosen samples and assuming that members of a population are equally likely to be chosen for the samples.

Non-probability sampling refers to choosing sample data that are not random but based on low cost and availability of the sample data, or specifically chosen based on the experience and judgement of the researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain sampling error.

A

Sampling error is the difference between a sample statistic and its corresponding population parameter (e.g., the sample mean minus the population mean).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Compare and contrast simple random, stratified random, cluster, convenience, and judgmental sampling.

A

Simple random sampling is a method of selecting a sample in such a way that each item or person in the population being studied has the same probability of being included in the sample.

Stratified random sampling involves randomly selecting samples proportionally from subgroups that are formed based on one or more distinguishing characteristics of the data, so that random samples from the subgroups will have the same distribution of these characteristics as the overall population.

Cluster sampling is also based on subgroups (not necessarily based on data characteristics) of a larger data set. In one-stage cluster sampling, the sample is formed from randomly chosen clusters (subsets) of the overall data set. In two-stage cluster sampling, random samples are taken from each of the randomly chosen cluster (subgroups).

Convenience sampling refers to selecting sample data based on its ease of access, using data that are readily available. Judgmental sampling refers to samples for which each observation is selected from a large data set by the researcher, based on her experience and judgment. Both are examples or non probability sampling and are non-random.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the central limit theorem and its importance.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Calculate and interpret the standard error of the sample mean.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Identify and describe desirable properties of an estimator.

A

Desirable statistical properties of an estimator include unbiasedness (sign of estimation error is random), efficiency (lower sampling error than any other unbiased estimator), and consistency (variance of sampling error decreases til larger sample size).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Contrast a point estimate and confidence interval estimate of population parameter.

A

Point estimates are single-value estimates of population parameters. An estimator is formula used to compute a point estimate.

Confidence intervals are ranges of values, within the actual value of the parameter will lie with a given probability.

The reliability factor is number that depends on the sampling distribution of the point estimate and the probability that the point estimate falls in the confidence interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Calculate and interpret a confidence interval for a population mean, given a normal distribution with 1) a known variance, 2) an unknown population variance, or 3) an unknown population variance and large sample size.

A

For a normally distributed population, a confidence interval for its mean can be constructed using a z-statistic when variance is known, and t-statistic when the variance is unknown. The z-statistic is acceptable in the case of a normal population with an unknown variance if the sample size is large (30+).
In general, we have:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the use of resampling (bootstrap, jackknife) to estimate the sampling distribution of a statistic.

A

Two resampling techniques to improve our estimates of the distribution of sample are the jackknife and bootstrapping. With the jackknife, we calculate n sample means, one with each observation in a sample means of size n. It can remove bias from our estimates based on the sample standard deviation without resampling.

With bootstrapping, we use the distribution of sample means (or other statistic) from a large number of samples of size n, drawn from a large data set. Bootstrapping can improve such estimates when analytical methods will not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the issues regarding selection of the appropriate sample size, data snooping bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.

A

Increasing the sample size will generally improve parameter estimates and narrow confidence intervals. The cost of more data must be weighted against these benefits, and adding data that is not generated by the same distribution will not necessarily improve accuracy or narrow confidence intervals.

Potential mistakes in the sampling method can bias results. These biases include data snooping (significant relationships that have occurred by chance), sample selection bias (selection is nonrandom), look-ahead bias (basing the test at a point in time on data not available at the time), survivorship bias (using only surviving mutual funds, hedge funds, etc.) and time-period bias (the relation does not hold over other time period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly