Reading 5: sampling and estimation Flashcards
An important difference between two-stage cluster sampling and stratified random sampling is that compared to stratified random sampling, two-stage cluster sampling:
uses all members of each sub-group (strata).
takes random samples all sub-groups (strata).
will not preserve differences in a characteristic across sub-groups.
With cluster sampling, the randomly selected subgroups may have different distributions of the relevant characteristic relative to the entire population. Cluster sampling uses only randomly selected subgroups, whereas stratified random sampling samples all subgroups to match the distribution of characteristics across the entire population. (LOS 5.a)
Sampling error is defined as:
an error that occurs when a sample of less than 30 elements is drawn.
an error that occurs during collection, recording, and tabulation of data.
the difference between the value of a sample statistic and the value of the corresponding population parameter.
An example might be the difference between a particular sample mean and the average value of the overall population. (LOS 5.b)
The mean age of all CFA candidates is 28 years. The mean age of a random sample of 100 candidates is found to be 26.5 years. The difference of 1.5 years is called:
the random error.
the sampling error.
the population error.
The sampling error is the difference between the population parameter and the sample statistic. (LOS 5.b)
A simple random sample is a sample drawn in such a way that each member of the population has:
some chance of being selected in the sample.
an equal chance of being included in the sample.
a 1% chance of being included in the sample.
In a simple random sample, each element of the population has an equal probability of being selected. Choice C allows for an equal chance, but only if there are 100 elements in the population from which the random sample is drawn. (LOS 5.c)
To apply the central limit theorem to the sampling distribution of the sample mean, the sample is usually considered to be large if n is greater than:
20.
25.
30.
Sample sizes of 30 or greater are typically considered large. (LOS 5.d)
If n is large and the population standard deviation is unknown, the standard error of the sampling distribution of the sample mean is equal to:
the sample standard deviation divided by the sample size.
the population standard deviation multiplied by the sample size.
the sample standard deviation divided by the square root of the sample size.
The formula for the standard error when the population standard deviation is unknown is s(subxbar) = s/sqroot(n). (LOS 5.e)
The standard error of the sampling distribution of the sample mean for a sample size of n drawn from a population with a mean of µ and a standard deviation of σ is:
sample standard deviation divided by the sample size.
sample standard deviation divided by the square root of the sample size.
population standard deviation divided by the square root of the sample size.
The formula for the standard error when the population standard deviation is known is theta(subxbar) = theta/sqroot(n). (LOS 5.e)
Assume that a population has a mean of 14 with a standard deviation of 2. If a random sample of 49 observations is drawn from this population, the standard error of the sample mean is closest to:
0.04.
0.29.
2.00.
s(subxbar) = s/sqroot(n)
given s=2 sx=2/sq49= 2/7
The population’s mean is 30 and the mean of a sample of size 100 is 28.5. The variance of the sample is 25. The standard error of the sample mean is closest to:
0.05.
0.25.
0.50.
theta(subxbar) = theta/sqroot(n)
given theta^2=25, thetasubx=5/sq100 =5/10
Which of the following is least likely a desirable property of an estimator?
Reliability.
Efficiency.
Consistency.
Efficiency, consistency, and unbiasedness are desirable properties of an estimator. (LOS 5.f)
A random sample of 100 computer store customers spent an average of $75 at the store. Assuming the distribution is normal and the population standard deviation is $20, the 95% confidence interval for the population mean is closest to:
$71.08 to $78.92.
$73.89 to $80.11.
$74.56 to $79.44.
Since the population variance is known and n ≥ 30, the confidence interval is determined as
So, the confidence interval is 75 ± 1.96(20/10) = 75 ± 3.92 = 71.08 to 78.92. (LOS 5.h)
Best Computers, Inc., sells computers and computer parts by mail. A sample of 25 recent orders showed the mean time taken to ship these orders was 70 hours with a sample standard deviation of 14 hours. Assuming the population is normally distributed, the 99% confidence interval for the population mean is:
70 ± 2.80 hours.
70 ± 6.98 hours.
70 ± 7.83 hours.
Since the population variance is unknown and n < 30, the confidence interval is determined as
df = n – 1 to get critical t-value. t0.01/2 and df = 24 is 2.797. So, the confidence interval is 70 ± 2.797(14 / 5) = 70 ± 7.83. (LOS 5.h)
What is the most appropriate test statistic for constructing confidence intervals for the population mean when the population is normally distributed, but the variance is unknown?
The z-statistic at α with n degrees of freedom.
The t-statistic at α/2 with n degrees of freedom.
The t-statistic at α/2 with n – 1 degrees of freedom.
Use the t-statistic at α/2 and n – 1 degrees of freedom when the population variance is unknown. While the z-statistic is acceptable when the sample size is large, sample size is not given here, and the t-statistic is always appropriate under these conditions. (LOS 5.h)
When constructing a confidence interval for the population mean of a nonnormal distribution when the population variance is unknown and the sample size is large (n > 30), an analyst may acceptably use:
either a z-statistic or a t-statistic.
only a z-statistic at α with n degrees of freedom.
only a t-statistic at α/2 with n degrees of freedom.
When the sample size is large, and the central limit theorem can be relied on to assure a sampling distribution that is normal, either the t-statistic or the z-statistic is acceptable for constructing confidence intervals for the population mean. The t-statistic, however, will provide a more conservative range (wider) at a given level of significance. (LOS 5.h)
Jenny Fox evaluates managers who have a cross-sectional population standard deviation of returns of 8%. If returns are independent across managers, how large of a sample does Fox need so the standard error of sample means is 1.265%?
7.
30.
40.
1.265=8/sqroot(n) , n = (8/1.265)^2 = 40