HBX- BA - 2 Flashcards

Question

What if we want to find a range of values associated with a probability that is not a cumulative probability? For example, suppose we want to know the range of values associated with the “middle” 99% of a normal distribution?

Answer 1

The normal curve is symmetrical, so we know that the middle 99% of the distribution comprises 49.5% on either side of the mean and excludes 0.5% on each of the tails. Thus we can find the value corresponding to the left side of the range using the NORM.INV function evaluated at 0.5% and the right side using the NORM.INV function evaluated at 99.5%. In this case, the values associated with the middle 99% are NORM.INV(0.005,63.5,2.5)=57.1 and NORM.INV(0.995,63.5,2.5)=69.9.

Answer 2

A theorem stating that if we take sufficiently large randomly-selected samples from a population, the means of these samples will be normally distributed regardless of the shape of the underlying population. (Technically, the underlying population must have a finite variance.) How this works….. * Take a random sample from a population * That sample has a mean- plot it on a graph! * Then take another sample- this sample ALSO has a mean (put IT on the graph) * If we did this TONS of times- it would form a normal distribution! No one actually does this…. In the REAL world we take ONE sample! This allows us to ignore the underlying distribution of the population that we want to learn about. We now know that the mean of a sample is PART of a normal distribution. Specifically- we know that the sample mean falls somewhere in a normal distribution that is centered at the true population mean. Because of this we can disregard the underlying distribution of the population and only focus on the sample.

Answer 3

According to the Central Limit Theorem, if we take large enough samples, the distribution of sample means will be normally distributed regardless of the shape of the underlying population. This population distribution will result in a normally distributed distribution of sample means. Note that there are other correct answers.

Answer 4

**The population mean (μ)** According to the Central Limit Theorem, if we take enough large samples, the mean of the set of sample means equals the population mean.

Answer 5

If we take a large sample- at least 30 points- there is a 95% chance that the mean of that sample falls within about 2 standard deviations of the mean of the distribution of sample means. BUT.... the central limit theorem tells us that the mean of the distribution of sample means is the same as the **population mean**, so we can conclude that **the mean of our sample is within about 2 standard deviations of our true population mean,** so we can conclude that the mean of our sample is within about 2 standard deviations of our true population mean.

Answer 6

A range constructed around a sample mean that estimates the true population mean. The confidence level of a confidence interval indicates how confident we are that the range contains the true population mean. For example, we are 95% confident that a 95% confidence interval contains the true population mean. The confidence level is equal to 1 – significance level.

Answer 7

* **68% Confidence Interval = 1** 1 Standard Deviation away from the mean (on both sides) *(1 is what "z" equals in equations when you want to use this confidence interval???)* * **95% Confidence Interval = 1.96** 2 Standard Deviations away from the mean (on both sides) *(1.96 is what "z" equals in equations when you want to use this confidence interval???)* * **99.7% Confidence Interval = 3** 3 Standard Deviations away from the mean (on both sides) *(3 is what "z" equals in equations when you want to use this confidence interval???)*

Answer 8

The standard deviation of the distribution of sample means is DIFFERENT of that of the population distribution. For the distribution of sample means: 2 Standard Deviations = 2 times (population mean / square root of sample size)

Answer 9

We are not saying that 95% of the time our sample mean IS the population mean. What we’re saying is that for 95% of all random samples -a range that is 2 standard deviations wide and is centered at the sample mean contains the population mean. Another way to explain it: if we took 20 samples from a population and then draw a confidence interval around each samples mean. On average, 95% of these would actually contain the true population mean. (19 out of 20)

Answer 10

* Increasing the sample mean Increasing the sample mean affects where the confidence interval is centered, not how wide the interval is. Use the interactive and review the confidence interval equation to help answer this question. * **_Increasing the confidence level_ Increasing the confidence level means that we must be more confident that the actual population****mean** **lies within our range. The confidence level must be wider to increase the likelihood that it captures the true population** **mean****. Note that confidence level determines the z-value, which in turn drives the width of the interval. Note that another option is also correct.** * Increasing the sample size Increasing the sample size will result in a more accurate prediction and, therefore, a narrower confidence interval. Use the interactive and review the confidence interval equation to help answer this question. Think about how the confidence level and the sample size affect the width of a confidence interval. Note that n is in the denominator, so as n increases, s / (square root of n) decreases, that is, the width of the confidence interval decreases. * **_Decreasing the sample size_ Decreasing the sample size will result in a less accurate prediction, and, therefore, a wider confidence interval. Note that****n** **is** **in the denominator, so as** **n** **decreases, s / (square root of n) increases, that is, the width of the confidence interval increases.** **THE BOLD ONES ABOVE ARE THE CORRECT ONES**

Answer 11

**We are 95% confident that the population** **mean** **lies between 42 and 58.** The 95% confidence interval is a range around the sample mean. We can say that we are 95% confident that the true population mean is within this range, based on the methods we used to calculate the range. If we were to construct similar intervals for 100 samples drawn from this population, on average 95 of the intervals will contain the true population mean.

Answer 12

**_FOR LARGE SAMPLES (30 and above)_ =CONFIDENCE.NORM(alpha, standard\_dev, size)** * **alpha**, the significance level, equals one minus the confidence level * (for example, a 95% confidence interval would correspond to the significance level 0.05).* * **standard\_dev** is the standard deviation of the population distribution. We will typically use the sample standard deviation, s, which is our best estimate of our population’s standard deviation. * **size** is the sample size, n. **_FOR SMALL SAMPLES (30 and below)_** If we don’t know anything about the underlying population, we cannot create a confidence interval with fewer than 30 data points because the properties of the Central Limit Theorem may not hold. However, if the underlying population is roughly normally distributed, we can use a confidence interval to estimate the population mean as long as we modify our approach slightly. We can gain insight into whether a data set is approximately normally distributed by looking at the shape of a histogram of that data. There are formal tests of normality that are beyond the scope of this course. To estimate the population mean with a small sample, we use a **t****-distribution**instead of a “z-distribution”, that is, a normal distribution.**A t****-distribution looks similar to a normal distribution but is not as tall in the center and has thicker tails.** These differences reflect that fact that a t-distribution is more likely than a normal distribution to have values farther from the mean. Therefore, the normal distribution’s “rules of thumb" do not apply. The shape of a t-distribution depends on the sample size; as the sample size grows towards 30, the t-distribution becomes very similar to a normal distribution. **=CONFIDENCE.T(alpha, standard\_dev, size)** * **alpha**, the significance level, equals one minus the confidence level * (for example, a 95% confidence interval would correspond to the significance level 0.05).* * **standard\_dev** is the standard deviation of the population distribution. We will typically use the sample standard deviation, s, which is our best estimate of our population’s standard deviation. * **size** is the sample size, n. Like CONFIDENCE.NORM, CONFIDENCE.T returns the **margin of error**, which we can add and subtract from the sample mean.

Answer 13

1. **Calculate the sample mean and standard deviation of the sample** =AVERAGE and =STDEV.S 2. **Calculate the confidence interval’s margin of error** =CONFIDENCE.NORM(alpha, standard\_dev, size) *(.05, standard deviation, sample size)* 3. **Calculate the lower & upper bounds** of the 95% confidence interval by adding and and subtracting the margin of error from the mean. **What do the lower and upper bounds of the confidence interval tell us?** MY ANSWER: We are 95% confident that the average body mass index (BMI) for all adults in the United States lies between 25.64 and 28.14. The lower and upper bounds of the confidence interval create a range that this BMI average for all adults in the US (the true population mean) will most likely lie. Also, the length from the lower bound to the upper bound is two deviations wide. This does NOT mean that there is a 95% chance that the population mean for BMI falls in this range. Rather, it only means that we can be 95% confident that the true population mean is within this range.

Answer 14

The probability distribution of the means of all randomly-selected samples of the same size that could be taken from a population. The Central Limit Theorem states that for sufficiently large randomly-selected samples, the distribution of sample means approximates a normal distribution. The standard deviation of the distribution of sample means is equal to the standard deviation of the population divided by the square root of the sample size. If we do not know the standard deviation of the population, we can estimate it using the sample standard deviation.

Answer 15

The function T.INV.2T can find the t-value for a desired level of confidence. **=T.INV.2T(probability, degrees\_freedom)** * **probability** is the significance level, that is, 1–confidence level, so for a 95% confidence interval, the significance level=0.05. * **degrees\_freedom** is the number of degrees of freedom, which in this case is simply the sample size minus one, or n–1. For example, for the BMI example where the confidence level was 95% and n=15, the t-value would be T.INV.2T(0.05,14)=2.14.

Answer 16

In general, we know that the **larger** the sample size, the **tighter** the confidence interval.

Answer 17

**Dummy Variable** A variable that takes on one of two values: 0 or 1. Dummy variables are used to transform categorical variables into quantitative variables. A categorical variable with only two categories (e.g. “heads” or “tails”) can be transformed into a quantitative variable using a single dummy variable that takes on the value 1 when a data point falls into one category (e.g. “heads”) and 0 when a data point falls into the other category (e.g. “tails”). For categorical variables with more than two categories, multiple dummy variables are required. Specifically, the number of dummy variables must be the total number of categories minus one. **=IF(logical\_test,[value\_if\_true],[value \_if\_false])** * **Enter the formula** above in the first cell * **Copy and paste** it (or drag it) into the other cells * To continue the process..... * **Calculate the mean** of the dummy variable, which is equivalent to the sample proportion, and * **the** **standard deviation.** * Remember that you can use either Excel’s descriptive statistics tool or the functions AVERAGE and STDEV.S* * **calculate the confidence interval** using the appropriate formula for this sample size.

Answer 18

Not sure if this is necessary to know for the test... but

Answer 19

Sample size is particularly important when dealing with very small (or very large) proportions. ## Footnote *Suppose we are sampling to find the prevalence of Amyotrophic Lateral Sclerosis (ALS), a disease commonly known as Lou Gehrig’s disease. In the United States, an estimated six to eight people per 100,000 have ALS. That is, the likelihood that a person in the U.S. has ALS is between 0.00006 and 0.00008, or between 0.006% and 0.008%. Would our sample be useful if we surveyed 100 people? No. Since the proportion we are estimating is very small, we need to have a large enough sample to make sure that it includes at least SOME people with the disease. Otherwise, we will not have enough data to obtain a good estimate of the true proportion.* The following guidelines are typically used when estimating proportions to ensure that a sample is large enough to provide a good estimate. The sample size n must be large enough to satisfy both conditions: ``` n= the sample size p= the mean ```

Answer 20

* The level of confidence, * our best estimate of the population standard deviation, and * the sample size. We control only the level of confidence and the sample size.

Answer 21

2. 5% * 130 is two standard deviations above the mean (130-100=30=2\*15=2\*stdev). We know that approximately 95% of the distribution is within 2 standard deviations of the mean. Therefore 5% must fall beyond 2 standard deviations, 2.5% at the top and 2.5% at the bottom.*

Answer 22

To solve problems like this, we can think in terms of cumulative probabilities and **use the NORM.INV function**. The value associated with the top 10% is the same as the value corresponding to the bottom 90%, so we need to find the value associated with a cumulative probability of 90%. Using- **NORM.INV(0.90****,B1****,B2****)=55**, we find that 90% of the distribution’s values are less than 55; thus 10% of the distribution’s values are greater than 55. If we wish, instead of first computing 100%–10%=90%, we can embed that formula in the function using NORM.INV(1–0.10,B1,B2)=55. _You must link directly to cells to obtain the correct answer._

Answer 23

Because our sample has fewer than 30 cases, we cannot assume that the distribution of sample means will be normal, and must use the t-distribution. The margin of error is based on the significance level (1-confidence level, or 1-0.90=0.10), the standard deviation (in B2) and the sample size (in B3). * We can compute the margin of error using the Excel function: **CONFIDENCE.T(0.10****,B2****,B3****)**. * The lower bound of the 90% confidence interval is the sample mean minus the margin of error, that is **B1–CONFIDENCE.T(0.10****,B2****,B3****)=225-1.41=220.07**. * The upper bound of the 90% confidence interval is the sample mean plus the margin of error, that is **B1+CONFIDENCE.T(0.10****,B2****,B3****)= 225+1.41=229.93.** You must link directly to cells to obtain the correct answer.

Answer 24

* First find the cumulative probability associated with the value 80 using the function **NORM.DIST(80,B1,B2,TRUE) = 2.275%**; this is the percentage of outcomes with values less than 80. * Then calculate the cumulative probability 70 using the function **NORM.DIST(70,B1,B2,TRUE)=0.01375%;** this is the percentage of cases with values less than 70. * Then find the difference between the two: **NORM.DIST(80,B1,B2,TRUE)–NORM.DIST(70,B1,B2,TRUE)=0.02275-0.00135=0.02140**, or 2.140%. 2.140% of the population has values between 70 and 80. You must link directly to cells to obtain the correct answer.

Answer 25

The value associated with the top 30% is the same as the value corresponding to the bottom 70%, so we need to find the value associated with a cumulative probability of 70%. * **Using NORM.INV(0.70,B1,B2)=451**, we find that 70% of the distribution’s values are less than 451. Thus, 30% of the distribution’s values are greater than 451. If we wish, instead of first computing 100%–30%=70%, we can embed that formula in the function using NORM.INV(1–0.30,B1,B2)=451. You must link directly to cells to obtain the correct answer.