Week 16 - Sampling distributions and interval estimation Flashcards

(73 cards)

1
Q

What is the purpose of statistical inference?

A

The purpose of statistical inference is to obtain information about a population based on information contained in a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a population in statistics?

A

A population is the set of all elements of interest in a particular study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a sample in statistics?

A

A sample is a subset of the population that is used to draw conclusions about the whole population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do sample results tell us about a population?

A

Sample results provide only estimates of the values of population characteristics. However, with proper sampling methods, these estimates can be “good” or reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a parameter in statistics?

A

A parameter is a numerical characteristic of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a finite population in the context of sampling?

A

A finite population is one that can be defined by a list, such as an organisation’s membership roster, credit card account numbers, or inventory product numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is simple random sampling from a finite population?

A

Simple random sampling is a method where a sample of size n is selected from a finite population of size N, such that every possible sample of size n has an equal chance of being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is sampling with replacement?

A

Sampling with replacement means each sampled element is returned to the population before selecting the next element, allowing it to be selected again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is sampling without replacement?

A

Sampling without replacement means each sampled element is not returned to the population, so it cannot be selected again. This is the method used most often.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are samples selected in large sampling projects?

A

In large sampling projects, computer-generated random numbers are often used to automate the sample selection process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an infinite population in statistics?

A

An infinite population is often defined by an ongoing process where the elements are generated as if the process continues indefinitely.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the conditions for simple random sampling from an infinite population?

A

Each element selected comes from the same population.

Each element is selected independently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why can’t random number selection be used for infinite populations?

A

Because it is impossible to list all elements in an infinite population, random number selection methods cannot be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is point estimation in statistics?

A

Point estimation uses sample data to compute a value (a sample statistic) that serves as an estimate of a population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is x̄ in point estimation?

A

x̄ is the point estimator of the population mean μ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is S in point estimation?

A

S is the point estimator of the population standard deviation σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is P in point estimation?

A

P is the point estimator of the population proportion π

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When is a point estimator considered unbiased?

A

A point estimator is unbiased when its expected value is equal to the population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is sampling error?

A

Sampling error is the absolute difference between an unbiased point estimate and the true population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What causes sampling error?

A

Sampling error occurs because we use a sample (a subset) instead of the entire population to make estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Can we make probability statements about sampling error?

A

Yes, statistical methods can be used to make probability statements about the likely size of the sampling error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the sampling error for the sample mean?

A

|x̄ −µ| for sample mean

where x̄ is the sample mean and
μ is the population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the sampling error for the sample standard deviation?

A

|𝑠 − σ | for sample deviation

where s is the sample standard deviation and σ is the population standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the sampling error for the sample proportion?

A

|𝑝 − π | for sample proportion

where p is the sample proportion and π is the population proportion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Example St. Andrew’s College receives 900 applications annually from prospective students. The application form contains a variety of information including the individual’s aptitude test score (ranging b/w 400 and 1600) and whether or not the individual desires on-campus housing. The director of admissions would like to know the following information: the average SAT score for the 900 applicants, and the proportion of applicants that want to live on campus.
We shall now look at two alternatives for obtaining the desired information. Conducting a census of the entire 900 applicants Selecting a sample of 30 applicants, using Excel If the relevant data for the entire 900 applicants were in the college’s database, the population parameters of interest could be calculated using the formulae presented in previous lectures. We will assume for the moment that conducting a census is practical in this example. Population Mean SAT Score: µ= (∑𝑥_𝑖 )/900 = 995.84 Population Standard Deviation for SAT Score σ= √((∑(𝑥_𝑖−µ)^2 )/900) = 344.07 Population Proportion Wanting On-Campus Housing π = 435/900 = 48.33% Decide a sample of 30 applicants will be used Taking a Sample of 30 Applicants The numbers we draw will be the numbers of the applicants we will sample unless the random number is greater than 900 or the random number has already been used. We shall continue to draw random numbers until we have selected 30 applicants for our sample. Computers can be used to generate random numbers for selecting random samples. For example, Excel’s function RANDBETWEEN can be used to generate random numbers between 1 and 900. = RANDBETWEEN(1,900) Then we choose the 30 applicants corresponding to the 30 smallest random numbers as our sample. x̄ as Point Estimate of µ: x̄ = (∑𝑥_𝑖 )/30 = 955.67 𝑠 as Point Estimate of σ: 𝑠= √((∑(𝑥_𝑖−x̄)^2 )/(30−1)) = 330.98 p as Point Estimate of π : p = 16/30 = 53.33% Note: Different random numbers would have identified a different sample which would have resulted in different point estimates.
26
What is the process of statistical inference using the sample mean x̄?
In statistical inference, a simple random sample of size n is selected from a population with mean μ. The sample data provide a value for the sample mean x̄, which is then used to make inferences about the population mean μ.
27
What does the sample mean x̄ help us infer?
The sample mean x̄ is used to make inferences about the population mean μ.
28
What is the sampling distribution of x̄?
The sampling distribution of x̄ is the probability distribution of all possible values of the sample mean x̄.
29
What is the expected value of x̄?
The expected value of x̄ is E(x̄)=μ, where μ is the population mean.
30
What does unbiasedness mean for x̄?
Unbiasedness means that the expected value of x̄ equals the population mean: E(x̄) = μ
31
What is the standard deviation of x̄ for a finite population?
For a finite population, the standard deviation of x̄ is given by: σ_𝑋 ̅ = σ/√𝑛 * √(𝑁−𝑛)/(𝑁−1) where σ is the population standard deviation, n is the sample size, and N is the population size.
32
What is the standard deviation of 𝑋 ̅ for an infinite population?
For an infinite population, the standard deviation of 𝑋 ̅ is: σ_𝑋 ̅ = σ/√𝑛 where σ is the population standard deviation and n is the sample size.
33
When is a finite population treated as an infinite population?
A finite population is treated as infinite when the sample size n is less than 5% of the population size (n/N<0.05).
34
What is σ_𝑋 ̅ is referred to as?
σ_𝑋 ̅ is referred to as the standard error of the mean.
35
What is the form of the sampling distribution of 𝑋 ̅ when the population has a normal distribution?
When the population has a normal distribution, the sampling distribution of 𝑋 ̅ is also normal, regardless of the sample size.
36
When can the sampling distribution of 𝑋 ̅ be approximated by a normal distribution?
The sampling distribution of 𝑋 ̅ can be approximated by a normal distribution whenever the sample size is 30 or more, according to the Central Limit Theorem.
37
What sample size may be needed when the population is highly skewed or has outliers?
In cases where the population is highly skewed or contains outliers, a sample size of 50 or more may be needed for the sampling distribution of 𝑋 ̅ to be approximated by a normal distribution.
38
Example What is the probability that a simple random sample of 30 applicants will provide an estimate of the population mean SAT score that is within +/-10 of the actual population mean µ ? In other words, what is the probability that 𝑋 ̅ will be between 980 and 1000?
Step 1: Calculate the z-value at the upper endpoint of the interval. z = (1000 - 990)/14.6= 0.68 Step 2: Find the area under the curve to the left of the upper endpoint. P(Z < 0.68) = 0.7517 Area = 0.7517 σ_X = 14.6 Step 3: Calculate the z-value at the lower endpoint of the interval. z = (980 - 990)/14.6= - 0.68 Step 4: Find the area under the curve to the left of the lower endpoint. P(Z < -0.68) = 0.2483 Area = 0.2483 Step 5: Calculate the area under the curve between the lower and upper endpoints of the interval. P(-0.68 < Z < 0.68) = P(Z < 0.68) - P(Z < -0.68) = 0.7517 - 0.2483 = 0.5034 The probability that the sample mean SAT score will be between 980 and 1000 is: P(980 < 𝑋 ̅ < 1000) = 0.5034 Area = 0.5034
39
What is the process of making inferences about a population proportion?
To make inferences about a population proportion, a simple random sample of size n is selected from the population with proportion π. The sample data provide a value for the sample proportion P, which is then used to make inferences about the population proportion π.
40
What is the sample proportion P used for?
The sample proportion P is used to make inferences about the population proportion π.
41
What is the sampling distribution of P?
The sampling distribution of P is the probability distribution of all possible values of the sample proportion P.
42
What is the expected value of the sample proportion P?
The expected value of P is equal to the population proportion π: E(P)=π where π is the population proportion.
43
What is the standard deviation of P for a finite population?
For a finite population, the standard deviation of P is: σ_𝑃=√(π(1−π)/𝑛) * √(𝑁−𝑛)/(𝑁−1) where π is the population proportion, n is the sample size, and N is the population size.
44
What is the standard deviation of P for an infinite population?
For an infinite population, the standard deviation of P is: σ_𝑃=√(π(1−π)/𝑛) where π is the population proportion and n is the sample size.
45
What is σ_𝑃 referred to as?
σ_𝑃 is referred to as the standard error of the proportion.
46
When can the sampling distribution of P be approximated by a normal distribution?
The sampling distribution of P can be approximated by a normal distribution whenever the sample size is large, specifically when: 𝑛π≥5 and 𝑛(1−π)≥5
47
What conditions must be satisfied for the sample size to be considered large when approximating the sampling distribution of P?
The sample size is considered large when: 𝑛π≥5 and 𝑛(1−π)≥5
48
What is interval estimation for the population mean when σ is known?
When σ (the population standard deviation) is known, interval estimation for the population mean involves using the Z-distribution to construct a confidence interval.
49
What is interval estimation for the population mean when σ is unknown?
When σ is unknown, interval estimation for the population mean involves using the t-distribution to construct a confidence interval, as the sample standard deviation s is used to estimate σ.
50
What is interval estimation for the population proportion?
Interval estimation for the population proportion involves using the sample proportion p and the standard error of the proportion to construct a confidence interval, typically with the Z-distribution.
51
Why can't a point estimator provide the exact value of the population parameter?
A point estimator cannot provide the exact value of the population parameter because it is based on sample data, which is subject to variability.
52
How is an interval estimate calculated?
An interval estimate is computed by adding and subtracting a margin of error to the point estimate: Point Estimate +/- Margin of Error
53
What is the purpose of an interval estimate?
The purpose of an interval estimate is to provide information about how close the point estimate is to the true value of the population parameter.
54
What is the general form of an interval estimate for a population mean?
The general form of an interval estimate for a population mean is: 𝑥 ̅ ± 𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝐸𝑟𝑟𝑜𝑟 where 𝑥 ̅ is the sample mean.
55
What do you need to construct an interval estimate of a population mean when σ is known?
To construct an interval estimate of a population mean when σ is known, you must compute the margin of error using the population standard deviation σ.
56
What happens when the population standard deviation σ is not known?
When σ is not known, the sample standard deviations can be used as an estimate of σ, and the case is referred to as the "unknown σ" case.
57
What is the "σ known" case in interval estimation?
The "σ known" case refers to situations where the population standard deviation σ is known or can be reasonably estimated from historical data or other sources, allowing the use of σ in the margin of error calculation.
58
What is the relationship between the sample mean and the population mean in interval estimation when σ is known?
There is a 1−α probability that the value of the sample mean will be within 𝑍_α/2 * σ_𝑋 ̅ of the population mean μ, where Z_α/2 is the critical value from the standard normal distribution and σ_𝑋 ̅ is the standard error of the sample mean.
59
What is the formula for the interval estimate of the population mean μ when σ is known?
The interval estimate of μ is given by: 𝑥 ̅ ± 𝑍_(α∕2) * σ/√𝑛 Where: 𝑥 ̅ is the sample mean 1 - α is the confidence coefficient 𝑍_(α∕2) is the z value providing an area of α/2 in the upper tail of the standard normal probability distribution σ is the population standard deviation n is the sample size
60
What is the recommended sample size for interval estimation when σ is known?
In most applications, a sample size of n=30 is adequate. If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended. If the population is roughly symmetrical but not normally distributed, a sample size of 15 may be sufficient. If the population is believed to be approximately normal, a sample size of less than 15 can be used.
61
Example Cool Beats has 260 retail outlets throughout Northern Europe. The firm is evaluating a potential location for a new outlet, based in part on the mean annual income of the individuals in the marketing area of the new location. A sample of size n = 36 was taken; the sample mean income is €31,100. The population is believed to be not highly skewed. The population standard deviation is estimated to be €4,500, and the confidence coefficient to be used in the interval estimate is 0.95.
95% of the sample means that can be observed are within + 1.96σ_𝑋 ̅ of the population mean µ. The margin of error is: 𝑍_(α∕2) σ/√𝑛 = 1.96 4,500/√36=1,470 At the 95% confidence level, the margin of error is €1,470. Interval estimate of µ is: €31,100 ± €1,470 or €29,630 to €32,570 We are 95% confident that the interval contains the population mean.
62
What do you do if an estimate of the population standard deviation σ is not available for interval estimation?
If an estimate of σ is not available, we use the sample standard deviation s to estimate σ. In this case, the interval estimate for μ is based on the t-distribution.
63
What is the t-distribution?
The t-distribution is a family of similar probability distributions, where each specific distribution depends on a parameter called the degrees of freedom.
64
What are degrees of freedom in the context of the t-distribution?
Degrees of freedom refer to the number of independent pieces of information that go into the computation of the sample standard deviation s. It is typically calculated as n−1, where n is the sample size.
65
How does the dispersion of a t-distribution change with the number of degrees of freedom?
A t-distribution with more degrees of freedom has less dispersion. As the number of degrees of freedom increases, the t-distribution becomes closer to the standard normal distribution.
66
What happens as the number of degrees of freedom increases for the t-distribution?
As the number of degrees of freedom increases, the difference between the t-distribution and the standard normal distribution becomes smaller and smaller. For more than 100 degrees of freedom, the standard normal z-value provides a good approximation to the t-value.
67
Where can you find the standard normal z-values for more than 100 degrees of freedom?
The standard normal z-values for more than 100 degrees of freedom can be found in the infinite degrees (∞) row of the t-distribution table.
68
What is the formula for the interval estimate of a population mean when σ is unknown?
The interval estimate for the population mean μ when σ is unknown is given by: 𝑥 ̅ ± 𝑡_(α∕2) * 𝑠/√𝑛 Where: 𝑥 ̅ is the sample mean 1 - α is the confidence coefficient 𝑡_(α∕2) is the t value providing an area of α/2 in the upper tail of a t distribution with n - 1 degrees of freedom 𝑠 is the sample standard deviation n is the sample size
69
Example A reporter for a student newspaper is writing an article on the cost of off-campus housing. A sample of 16 apartments within one kilometer of campus resulted in a sample mean of €650 per month and a sample standard deviation of €55. Let us provide a 95% confidence interval estimate of the mean rent per month for the population of apartments within one kilometer of campus. We shall assume this population to be normally distributed. At 95% confidence, a = 0.05, and a/2 = 0.025. 𝑡_0.025 is based on n - 1 = 16 - 1 = 15 degrees of freedom.
In the t distribution table we see that 𝑡_0.025 = 2.131. Interval Estimate 𝑥 ̅ ± 𝑡_(α∕2) 𝑠/√𝑛 650±2.131 55/√16 =650±29.30 We are 95% confident that the mean rent per month for the population of efficiency apartments within a half-mile of campus is between €620.70 and €679.30.
70
Summary of Interval Estimation Procedures for a Population Mean
Can the population standard deviation s be assumed known? Yes: σ known case use: 𝑥 ̅ ± 𝑍_(α∕2) * σ/√𝑛 No: σ unknown case use the sample standard deviation s to estimate use: 𝑥 ̅ ± 𝑡_(α∕2) 𝑠/√𝑛
71
What is the general form of an interval estimate for a population proportion?
The general form of an interval estimate for a population proportion is: 𝑝 ± 𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝐸𝑟𝑟𝑜𝑟 where p is the sample proportion.
72
What conditions must be met for the sampling distribution of P to be approximated by a normal distribution?
The sampling distribution of P can be approximated by a normal distribution when the following conditions are met: np > 5 and n(1–p) > 5. where n is the sample size and p is the sample proportion.
73
What is the formula for the interval estimate of a population proportion?
The interval estimate for a population proportion is: 𝑝 ± 𝑧_(α∕2) √(𝑝(1−𝑝)/𝑛) 𝑝 is the sample proportion 1 - α is the confidence coefficient 𝑧_(α∕2) is the z value providing an area of α/2 in the upper tail of the standard normal probability distribution n is the sample size