HBX- BA - 3 Flashcards
Null Hypothesis
A null hypothesis is a statement about a topic of interest, typically based on historical information or conventional wisdom. We start a hypothesis test by assuming that the null hypothesis is true and then test to see if we can nullify it, which is why it’s called the “null” hypothesis. The null hypothesis is the opposite of the hypothesis we are trying to substantiate (the alternative hypothesis).
Alternative Hypothesis
The alternative hypothesis (the opposite of the null hypothesis) is the theory or claim we are trying to substantiate. If our data allow us to nullify the null hypothesis, we substantiate the alternative hypothesis.
How Null/Alternate Hypothesis system works….
It’s similar to a jury trial *It’s guilty or not guilty (They can’t declare them innocent!!! Only not guilty! Same with these tests)
We either REJECT the Null Hypothesis or FAIL TO Reject the Null Hypothesis
Reject the null hypothesis
The null hypothesis is that the average satisfaction rating has not changed, that is, that the population mean μμ is still equal to 6.7. Drawing a sample with an average satisfaction rating of 9.9 from a population that has an average rating of 6.7 is extremely unlikely, so we would almost certainly reject the null hypothesis and conclude that the average satisfaction rating is no longer 6.7.
At what point do we reject the null hypothesis?
- We first have to define what do we mean by likely? (Normally 95%)
-
Construct a range of likely sample means
- We should use the HISTORICAL Mean to center our range (We do this because we always assume that the null hypothesis is true)
- Since the central limit theorem tells us that the distribution of sample means follows a normal distribution, we can use the familiar properties of the familiar curve to construct this range.
For example, recall that 95% of a normal distribution falls within 2 standard deviations of the mean. This means that the z value associated with this about 2 (1.96). We can find this 2 ways
- The Null hypothesis is true if it fits within this range!
- If we take a sample and the mean of that sample does not fall between the range- we REJECT the null hypothesis. (If it is, it’s very unlikely to choose a sample that falls out of that range 5%)
The reach outside the 95% range is called the rejection region
6.7
We always start a hypothesis test by assuming that the null hypothesis is true. Thus, the center of the range of likely sample means is the historical average—the average specified by the null hypothesis, in this case is 6.7. Remember, the null hypothesis is that showing old classics has not changed the average satisfaction rating.
Suppose we wanted to calculate a 90% range of likely sample means for the movie theater example (historical mean 6.7, standard dev 2.8, sample size 196). Select the function that would correctly calculate this range.
6.7±CONFIDENCE.NORM(0.10,2.8,196)
The range of likely sample means is centered at the historical population mean, in this case 6.7. Since this is a 90% range of likely sample means, alpha equals 0.10.
Suppose we wanted to calculate a 90% range of likely sample means for the movie theater example but our sample size had been only 15. (same historical mean 6.7, same standard dev 2.8) Select the function that would correctly calculate this range.
6.7±CONFIDENCE.T(0.10,2.8,15)
The range of likely sample means is centered at the historical population mean, in this case 6.7. We must use CONFIDENCE.T since the sample size is less than 30.
Significance Level
The threshold for deciding whether to reject the null hypothesis. The most commonly used significance level is .05 (corresponding to a confidence level of 95%), which means we would reject the null hypothesis when the p-value < .05. The significance level is represented by the Greek letter α (alpha) and is equal to 1-confidence level.
Significance Level = 1 – Confidence Level
- The significance level defines the rejection region by specifying the threshold for deciding whether or not to reject null hypothesis. When the p-value of a sample mean is less than the significance level, we reject the null hypothesis.
- The significance level is the area of the rejection region, meaning the area under the distribution of sample means over the rejection region.
- The significance level is the probability of rejecting the null hypothesis when the null hypothesis is actually true.
- The significance level also defines the confidence level. ((The confidence level tells us how confident we can be that the range of likely sample means contains the true population mean. We should always specify the significance level (and thus the confidence level) before performing a hypothesis test.))
If we use the most commonly used significance level of 0.05, we draw our conclusions on whether the sample’s p-value is less than or greater than 0.05. If the p-value is less than 0.05, we reject the null hypothesis. If the p-value is greater than or equal to 0.05, we fail to reject the null hypothesis. It is always important to use your managerial judgment when making decisions, especially when the p-value is very close to the significance level.
If we specify a 75% confidence level, what percentage of sample means do we expect to fall in the rejection region?
25%
The significance level equals the area of the rejection region. The significance level equals 1–confidence level. In this case, 1–0.75=0.25, that is, 25%.
If the significance level of a hypothesis test is 10%, for which of the following p-values would you reject the null hypothesis? Select all that apply.
- 0.08
- 0.89
- 0.05
- 0.11
- 0.08
- 0.05
We reject the null hypothesis if the mean of our sample falls within the rejection region. The area of the rejection region is equal to the significance level, so we reject the null hypothesis when the p-value is less than the significance level. Since 0.08 & .05 are less than 0.10, we would reject the null hypothesis. Remember: the lower the p-value, the stronger the evidence is against the null hypothesis. Note that another option is also correct.
Define P-Value and state How to Calculate in Excel.
If we have sufficient evidence to reject a null hypothesis. We may wish to know how strong our evidence is.
A p-value can be interpreted as the probability, assuming the null hypothesis is true, of obtaining an outcome that is equal to or more extreme than the result obtained from a data sample. The lower the p-value, the greater the strength of statistical evidence against the null hypothesis.
When we take a sample, we reject a hypothesis if a samples p value is less than 5%
Although there are multiple ways to calculate a p-value in Excel, we will use a t-test, the most common method used for hypothesis tests. The t-test uses a t-distribution, which provides a more conservative estimate of the p-value when the sample size is small. Recall that as the sample size increases, the t-distribution converges to a normal distribution, so a t-distribution can be used for large samples as well. Companies tend to use the t-distribution rather than the normal distribution because it is safe for both small and large samples.
=T.TEST(array1, array2, tails, type)
- array1 is a set of numerical values or cell references. We will place our sample data in this range.
-
array2 is a set of numerical values or cell references.
We have only one set of data, so we will use the historical mean. (To do this, we will need to create a new row just filled with the historical mean) -
tails is the number of tails for the distribution. It can be either 1 or 2.
- 1 is for a one-sided test - this is only used if this information is ABSOLUTELY KNOWN
- 2 is for a 2 sided test
-
type can be 1, 2, or 3.
- Type 1 is a paired test and is used when the same group from a single population is tested twice to provide paired “before and after” data for each member of the group.
- Type 2 is an unpaired test in which the samples are assumed to have equal variances.
- Type 3 is an unpaired test in which the samples are assumed to have unequal variances. The variances of the two columns are clearly different in our case, so we use type 3. There are ways to test whether variances are equal, but when in doubt, use type 3.
Another way to calculate the p-value for a 2 sided test is to calculate the p-value with 2 tails and then divide the answer by 2!
Calculate the p-value for the movie theater ratings from the 196 people that were sampled. Remember that the sample mean is 7.3 and the sample standard deviation is 2.8. Before we begin, we must create a second column of data.
- Have Sample information in Excel in column 1 (This is normally given in this course)
- Create a column for the Historical Mean
- Enter the function =T.TEST(array1, array2, tails, type)
Since the p-value, 0.0026, is less than the 0.05 significance level, we reject the null hypothesis and conclude that the customer satisfaction rating has changed.
If the null hypothesis is true, the likelihood of obtaining a sample with a mean at least as extreme as 7.3 is 0.26%
The p-value of 0.0026 indicates that if the population mean were actually still 6.7, there would be a very small possibility, just 0.26%, of obtaining a sample with a mean at least as extreme as 7.3. Equivalently, since 7.3–6.7=0.6, this p-value tells us that if the null hypothesis is true, the probability of obtaining a sample with a mean less than 6.7–0.6=6.1 or greater than 6.7+0.6=7.3 is 0.26%.