Sampling and Estimation Flashcards

1
Q

Avoid Biased results by:

A

1) phrasing questions neutrally;
2) ensuring that the sampling method is appropriate for the demographic of the target population;
3) pursuing high response rates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

normal distribution [is visualised as]

A

a unique symmetrical shape whose center and width are determined by its mean and standard deviation respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

rules of thumb for estimating probabilities for a normal distribution:

A

About 68% of the probability is contained in the range reaching one standard deviation away from the mean on either side

About 95% of the probability is contained in the range reaching two standard deviations (1.96 to be exact) away from the mean on either side

About 99.7% of the probability is contained in the range reaching three standard deviations away from the mean on either side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

z-value

A

A z-value of a point xx is the distance xx lies from the mean, measured in standard deviations, z=x−µσz=x−µσ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The Central Limit Theorem states that

A

if we take enough sufficiently large samples from any population, the means of those samples will be normally distributed, regardless of the shape of the underlying population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The mean of the Distribution of Sample Means equals

A

the mean of the population distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The standard deviation of the Distribution of Sample Means equals

A

the standard deviation of the population distribution divided by the square root of the sample size. Thus, increasing the sample size decreases the width of the Distribution of Sample Means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

confidence interval contains

A

the true population mean with a certain level (e.g., 90%) of confidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

confidence interval for LARGE samples >30

A

The function CONFIDENCE.NORM calculates the margin of error, which we add and subtract from the sample mean to find the confidence interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

for SMALL samples <30

A

For small samples, we use a t-distribution, which is shorter and wider than a normal distribution. The t-distribution provides a wider range, a more conservative estimate of where the true population mean lies.

The function CONFIDENCE.T calculates the margin of error, which we add and subtract from the sample mean to find the confidence interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Q: What happens to the sample mean and standard deviation as you take new samples of equal size?

A

Sample Mean - changes; Standard deviation, changes; The sample mean and standard deviation vary but remain fairly close to the population mean and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Excel’s RAND function

A

RAND assigns a random identification (ID) number between 0 and 1 to each data point—in this case, to each phone number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Q: What happens to the sample mean and standard deviation as you increase the sample size?

A

As we increase the sample size, the sample includes more members of the population, so it is less likely to include only unusual values. Therefore, as the sample grows, the sample mean and standard deviation approach the population mean and standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SAMPLING RULES OF THUMB

A
  1. It’s symmetric, so it’s mean & median are the same; they are located at the centre.

The Probability that any normal distribution has a value less than its mean is always 50%, and also 50% chance it’s greater.

  1. How wide or narrow the curve is depends on the distribution standard distribution, which are specific by its mean and its standard deviation.
  2. Regardless of the location or its width it always keeps its bell shape. We can create a few rules of thumb for normal distribution 68% of the probability of the normal distribution is located one standard deviation below its mean to 1 standard deviation above its mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What probability falls within one standard deviation of the mean?

A

Approximately 68%
The phrase “within one standard deviation of the mean” means “between one standard deviation below the mean and one standard deviation above the mean.” This answer can be found using the rules of thumb for the normal distribution or by using the previous interactive. 68% of the probability lies within one standard deviation of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the probability of obtaining a value less than or equal to two standard deviations below the mean?

A

95% is the probability of being within two standard deviation of the mean. The probability of obtaining a value less than or equal to two standard deviation below the mean is the cumulative probability associated with z=−2z=−2. Position the slider so that it highlights the range from the far left side to “z=−2z=−2.”

Approximately 2%

17
Q

Suppose we want to know the percentage of women who are shorter than 63 inches. Since the mean is 63.5 inches, we can estimate that less than 50% are shorter than 63 inches. How do we calculate the exact percentage?

A

To find a cumulative probability, the probability of being less than a specified value on a normal curve, we use Excel’s NORM.DIST function.
=NORM.DIST(x, mean, standard_dev, cumulative)

18
Q

Find the probability of obtaining a value:

for less than =
for more than =

A

IF CALCULATING LESS THAN OR EQUAL TO - USE NORM.DIST(195,B1,B2,TRUE)=

IF CALCULATING GREATER THAN OR EQUAL TO - USE 1–NORM.DIST(45,B1,B2,TRUE)

19
Q

How to find the value associated with the cumulative probability 99% for the distribution of women’s heights?

A

For a normal distribution, we can use Excel’s NORM.INV function to calculate a given percentile. The “INV” indicates that this function calculates the inverse of the cumulative probability.

=NORM.INV(probability, mean, standard_dev)

20
Q

What is the center value of the distribution of the sample means?

A

The population mean (μμ)
According to the Central Limit Theorem, if we take enough large samples, the mean of the set of sample means equals the population mean.

21
Q

What is the standard deviation of the distribution of sample means?

A

σn√σn, the population standard deviation divided by the square root of the sample size, is the standard deviation of the distribution of sample means. Large samples will create a “tighter” distribution of sample means than smaller samples.

22
Q

Suppose that you have a sample with a mean of 50. You construct a 95% confidence interval and find that the lower and upper bounds are 42 and 58. What does this 95% confidence interval around the sample mean indicate?

A

We are 95% confident that the population mean lies between 42 and 58.
The 95% confidence interval is a range around the sample mean. We can say that we are 95% confident that the true population mean is within this range, based on the methods we used to calculate the range. If we were to construct similar intervals for 100 samples drawn from this population, on average 95 of the intervals will contain the true population mean.

23
Q

Calculate Confidence Interval =

A

=CONFIDENCE.NORM(alpha, standard_dev, size)

alpha, the significance level, equals one minus the confidence level (for example, a 95% confidence interval would correspond to the significance level 0.05).

standard_dev is the standard deviation of the population distribution. We will typically use the sample standard deviation, ss, which is our best estimate of our population’s standard deviation.

size is the sample size, nn.

24
Q

T-Distribution for small sample < 30

A

To calculate the width of the confidence interval for small samples. =CONFIDENCE.T(alpha, standard_dev, size)