Reading 5: sampling and estimation Flashcards

1
Q

An important difference between two-stage cluster sampling and stratified random sampling is that compared to stratified random sampling, two-stage cluster sampling:
uses all members of each sub-group (strata).
takes random samples all sub-groups (strata).
will not preserve differences in a characteristic across sub-groups.

A

With cluster sampling, the randomly selected subgroups may have different distributions of the relevant characteristic relative to the entire population. Cluster sampling uses only randomly selected subgroups, whereas stratified random sampling samples all subgroups to match the distribution of characteristics across the entire population. (LOS 5.a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sampling error is defined as:
an error that occurs when a sample of less than 30 elements is drawn.
an error that occurs during collection, recording, and tabulation of data.
the difference between the value of a sample statistic and the value of the corresponding population parameter.

A

An example might be the difference between a particular sample mean and the average value of the overall population. (LOS 5.b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The mean age of all CFA candidates is 28 years. The mean age of a random sample of 100 candidates is found to be 26.5 years. The difference of 1.5 years is called:
the random error.
the sampling error.
the population error.

A

The sampling error is the difference between the population parameter and the sample statistic. (LOS 5.b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A simple random sample is a sample drawn in such a way that each member of the population has:
some chance of being selected in the sample.
an equal chance of being included in the sample.
a 1% chance of being included in the sample.

A

In a simple random sample, each element of the population has an equal probability of being selected. Choice C allows for an equal chance, but only if there are 100 elements in the population from which the random sample is drawn. (LOS 5.c)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

To apply the central limit theorem to the sampling distribution of the sample mean, the sample is usually considered to be large if n is greater than:
20.
25.
30.

A

Sample sizes of 30 or greater are typically considered large. (LOS 5.d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If n is large and the population standard deviation is unknown, the standard error of the sampling distribution of the sample mean is equal to:
the sample standard deviation divided by the sample size.
the population standard deviation multiplied by the sample size.
the sample standard deviation divided by the square root of the sample size.

A

The formula for the standard error when the population standard deviation is unknown is s(subxbar) = s/sqroot(n). (LOS 5.e)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The standard error of the sampling distribution of the sample mean for a sample size of n drawn from a population with a mean of µ and a standard deviation of σ is:
sample standard deviation divided by the sample size.
sample standard deviation divided by the square root of the sample size.
population standard deviation divided by the square root of the sample size.

A

The formula for the standard error when the population standard deviation is known is theta(subxbar) = theta/sqroot(n). (LOS 5.e)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Assume that a population has a mean of 14 with a standard deviation of 2. If a random sample of 49 observations is drawn from this population, the standard error of the sample mean is closest to:
0.04.
0.29.
2.00.

A

s(subxbar) = s/sqroot(n)
given s=2 sx=2/sq49= 2/7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The population’s mean is 30 and the mean of a sample of size 100 is 28.5. The variance of the sample is 25. The standard error of the sample mean is closest to:
0.05.
0.25.
0.50.

A

theta(subxbar) = theta/sqroot(n)
given theta^2=25, thetasubx=5/sq100 =5/10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following is least likely a desirable property of an estimator?
Reliability.
Efficiency.
Consistency.

A

Efficiency, consistency, and unbiasedness are desirable properties of an estimator. (LOS 5.f)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A random sample of 100 computer store customers spent an average of $75 at the store. Assuming the distribution is normal and the population standard deviation is $20, the 95% confidence interval for the population mean is closest to:
$71.08 to $78.92.
$73.89 to $80.11.
$74.56 to $79.44.

A

Since the population variance is known and n ≥ 30, the confidence interval is determined as

So, the confidence interval is 75 ± 1.96(20/10) = 75 ± 3.92 = 71.08 to 78.92. (LOS 5.h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Best Computers, Inc., sells computers and computer parts by mail. A sample of 25 recent orders showed the mean time taken to ship these orders was 70 hours with a sample standard deviation of 14 hours. Assuming the population is normally distributed, the 99% confidence interval for the population mean is:
70 ± 2.80 hours.
70 ± 6.98 hours.
70 ± 7.83 hours.

A

Since the population variance is unknown and n < 30, the confidence interval is determined as
df = n – 1 to get critical t-value. t0.01/2 and df = 24 is 2.797. So, the confidence interval is 70 ± 2.797(14 / 5) = 70 ± 7.83. (LOS 5.h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the most appropriate test statistic for constructing confidence intervals for the population mean when the population is normally distributed, but the variance is unknown?
The z-statistic at α with n degrees of freedom.
The t-statistic at α/2 with n degrees of freedom.
The t-statistic at α/2 with n – 1 degrees of freedom.

A

Use the t-statistic at α/2 and n – 1 degrees of freedom when the population variance is unknown. While the z-statistic is acceptable when the sample size is large, sample size is not given here, and the t-statistic is always appropriate under these conditions. (LOS 5.h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When constructing a confidence interval for the population mean of a nonnormal distribution when the population variance is unknown and the sample size is large (n > 30), an analyst may acceptably use:
either a z-statistic or a t-statistic.
only a z-statistic at α with n degrees of freedom.
only a t-statistic at α/2 with n degrees of freedom.

A

When the sample size is large, and the central limit theorem can be relied on to assure a sampling distribution that is normal, either the t-statistic or the z-statistic is acceptable for constructing confidence intervals for the population mean. The t-statistic, however, will provide a more conservative range (wider) at a given level of significance. (LOS 5.h)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Jenny Fox evaluates managers who have a cross-sectional population standard deviation of returns of 8%. If returns are independent across managers, how large of a sample does Fox need so the standard error of sample means is 1.265%?
7.
30.
40.

A

1.265=8/sqroot(n) , n = (8/1.265)^2 = 40

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Annual returns on small stocks have a population mean of 12% and a population standard deviation of 20%. If the returns are normally distributed, a 90% confidence interval on mean returns over a 5-year period is:
5.40% to 18.60%.
–2.75% to 26.75%.
–5.52% to 29.52%.

A

With a known population standard deviation of returns and a normally distributed population, we can use the z-distribution. The sample mean for a sample of five years will have a standard deviation of 20/sqroot5= 8.94%

A 90% confidence interval around the mean return of 12% is 12% ± 1.65(8.94%) = –2.75% to 26.75%. (LOS 5.h)

17
Q

Which of the following techniques to improve the accuracy of confidence intervals on a statistic is most computationally demanding?
The jackknife.
Systematic resampling.
Bootstrapping.

A

Bootstrapping, repeatedly drawing samples of equal size from a large data set, is more computationally demanding than the jackknife. We have not defined “systematic resampling” as a specific technique. (LOS 5.i)

18
Q

An analyst who uses historical data that was not publicly available at the time period being studied will have a sample with:
look-ahead bias.
time-period bias.
sample selection bias.

A

The primary example of look-ahead bias is using year-end financial information in conjunction with market pricing data to compute ratios like the price/earnings (P/E). The E in the denominator is typically not available for 30–60 days after the end of the period. Hence, data that was available on the test date (P) is mixed with information that was not available (E). That is, the P is “ahead” of the E. (LOS 5.j)

19
Q

Which of the following is most closely associated with survivorship bias?
Price-to-book studies.
Stratified bond sampling studies.
Mutual fund performance studies.

A

Mutual fund performance studies are most closely associated with survivorship bias because only the better-performing funds remain in the sample over time. (LOS 5.j)