Sampling and Estimation Flashcards
Avoid Biased results by:
1) phrasing questions neutrally;
2) ensuring that the sampling method is appropriate for the demographic of the target population;
3) pursuing high response rates.
normal distribution [is visualised as]
a unique symmetrical shape whose center and width are determined by its mean and standard deviation respectively.
rules of thumb for estimating probabilities for a normal distribution:
About 68% of the probability is contained in the range reaching one standard deviation away from the mean on either side
About 95% of the probability is contained in the range reaching two standard deviations (1.96 to be exact) away from the mean on either side
About 99.7% of the probability is contained in the range reaching three standard deviations away from the mean on either side
z-value
A z-value of a point xx is the distance xx lies from the mean, measured in standard deviations, z=x−µσz=x−µσ
The Central Limit Theorem states that
if we take enough sufficiently large samples from any population, the means of those samples will be normally distributed, regardless of the shape of the underlying population.
The mean of the Distribution of Sample Means equals
the mean of the population distribution.
The standard deviation of the Distribution of Sample Means equals
the standard deviation of the population distribution divided by the square root of the sample size. Thus, increasing the sample size decreases the width of the Distribution of Sample Means.
confidence interval contains
the true population mean with a certain level (e.g., 90%) of confidence.
confidence interval for LARGE samples >30
The function CONFIDENCE.NORM calculates the margin of error, which we add and subtract from the sample mean to find the confidence interval.
for SMALL samples <30
For small samples, we use a t-distribution, which is shorter and wider than a normal distribution. The t-distribution provides a wider range, a more conservative estimate of where the true population mean lies.
The function CONFIDENCE.T calculates the margin of error, which we add and subtract from the sample mean to find the confidence interval.
Q: What happens to the sample mean and standard deviation as you take new samples of equal size?
Sample Mean - changes; Standard deviation, changes; The sample mean and standard deviation vary but remain fairly close to the population mean and standard deviation
Excel’s RAND function
RAND assigns a random identification (ID) number between 0 and 1 to each data point—in this case, to each phone number.
Q: What happens to the sample mean and standard deviation as you increase the sample size?
As we increase the sample size, the sample includes more members of the population, so it is less likely to include only unusual values. Therefore, as the sample grows, the sample mean and standard deviation approach the population mean and standard deviation.
SAMPLING RULES OF THUMB
- It’s symmetric, so it’s mean & median are the same; they are located at the centre.
The Probability that any normal distribution has a value less than its mean is always 50%, and also 50% chance it’s greater.
- How wide or narrow the curve is depends on the distribution standard distribution, which are specific by its mean and its standard deviation.
- Regardless of the location or its width it always keeps its bell shape. We can create a few rules of thumb for normal distribution 68% of the probability of the normal distribution is located one standard deviation below its mean to 1 standard deviation above its mean
What probability falls within one standard deviation of the mean?
Approximately 68%
The phrase “within one standard deviation of the mean” means “between one standard deviation below the mean and one standard deviation above the mean.” This answer can be found using the rules of thumb for the normal distribution or by using the previous interactive. 68% of the probability lies within one standard deviation of the mean.
What is the probability of obtaining a value less than or equal to two standard deviations below the mean?
95% is the probability of being within two standard deviation of the mean. The probability of obtaining a value less than or equal to two standard deviation below the mean is the cumulative probability associated with z=−2z=−2. Position the slider so that it highlights the range from the far left side to “z=−2z=−2.”
Approximately 2%
Suppose we want to know the percentage of women who are shorter than 63 inches. Since the mean is 63.5 inches, we can estimate that less than 50% are shorter than 63 inches. How do we calculate the exact percentage?
To find a cumulative probability, the probability of being less than a specified value on a normal curve, we use Excel’s NORM.DIST function.
=NORM.DIST(x, mean, standard_dev, cumulative)
Find the probability of obtaining a value:
for less than =
for more than =
IF CALCULATING LESS THAN OR EQUAL TO - USE NORM.DIST(195,B1,B2,TRUE)=
IF CALCULATING GREATER THAN OR EQUAL TO - USE 1–NORM.DIST(45,B1,B2,TRUE)
How to find the value associated with the cumulative probability 99% for the distribution of women’s heights?
For a normal distribution, we can use Excel’s NORM.INV function to calculate a given percentile. The “INV” indicates that this function calculates the inverse of the cumulative probability.
=NORM.INV(probability, mean, standard_dev)
What is the center value of the distribution of the sample means?
The population mean (μμ)
According to the Central Limit Theorem, if we take enough large samples, the mean of the set of sample means equals the population mean.
What is the standard deviation of the distribution of sample means?
σn√σn, the population standard deviation divided by the square root of the sample size, is the standard deviation of the distribution of sample means. Large samples will create a “tighter” distribution of sample means than smaller samples.
Suppose that you have a sample with a mean of 50. You construct a 95% confidence interval and find that the lower and upper bounds are 42 and 58. What does this 95% confidence interval around the sample mean indicate?
We are 95% confident that the population mean lies between 42 and 58.
The 95% confidence interval is a range around the sample mean. We can say that we are 95% confident that the true population mean is within this range, based on the methods we used to calculate the range. If we were to construct similar intervals for 100 samples drawn from this population, on average 95 of the intervals will contain the true population mean.
Calculate Confidence Interval =
=CONFIDENCE.NORM(alpha, standard_dev, size)
alpha, the significance level, equals one minus the confidence level (for example, a 95% confidence interval would correspond to the significance level 0.05).
standard_dev is the standard deviation of the population distribution. We will typically use the sample standard deviation, ss, which is our best estimate of our population’s standard deviation.
size is the sample size, nn.
T-Distribution for small sample < 30
To calculate the width of the confidence interval for small samples. =CONFIDENCE.T(alpha, standard_dev, size)