Reading 12 - Hypothesis Testing Flashcards
Hypothesis testing
Hypothesis testing is the process of evaluating the accuracy of a statement regarding a population parameter (e.g., the population mean) given sample information (e.g., the sample mean).
Hypothesis
A hypothesis is a statement about the value of a population parameter developed for the purpose of testing a theory. Let’s assume that we think (hypothesize) that the average points scored in each game by a basketball player throughout his career is greater than 30. First, we would need to get some sample information. Then we would conduct a hypothesis test on the sample information (average of his scores in, let’s say, 49 randomly selected games) in order to be able to comment on the accuracy of the statement pertaining to the population parameter (his average score across all the games that he played in his entire career).
Null Hypothesis (H0)
The null hypothesis (H0) generally represents the status quo, and is the hypothesis that we are interested in rejecting. This hypothesis will not be rejected unless the sample data provides sufficient evidence to reject it. Null hypotheses regarding the mean of the population can be stated in the following ways:
H0 : μ <= μ 0
H0 : μ ³ μ 0
H0 : μ = μ 0
Where:
μ = population mean
μ0 = hypothesized value of the population mean
Alternative hypothesis (Ha)
The alternate hypothesis (Ha) is essentially the statement whose validity we are trying to evaluate. The alternate hypothesis is the statement that will only be accepted if the sample data provides convincing evidence of its truth. It is the conclusion of the test if the null hypothesis is rejected. Alternate hypotheses can be stated as:
Ha : μ > μ 0
Ha : μ < μ 0
Ha : μ Does not equal μ 0
Essentially, a hypothesis test involves the comparison of a sample’s test statistic to a critical value. The test statistic is calculated as:
Test statistic = (Sample statistic – Hypothesized statistic)/(Standard error of sample statistic)
The critical value depends on the relevant distribution, sample size, and level of significance used to test the hypothesis.
Hypothesis tests can be either one‐tailed or two‐tailed. Under one‐tailed tests:
we assess whether the value of a population parameter is either greater than or less than a given hypothesized value. Hypotheses for one‐tailed tests can be stated as:
- 0 : μ <= μ 0 versus Ha : μ > μ 0
When we are testing whether the population mean is greater than a given hypothesized value.
- 0 : μ ³ μ 0 versus Ha : μ < μ 0
When we are testing whether the population mean is less than a given hypothesized value.
The following rejection rules apply when trying to determine whether a population mean is greater than the hypothesized value.
Reject H0 when:
Test statistic > positive critical value
Fail to reject H0 when:
Test statistic ≤ positive critical value
Under two‐tailed tests, we assess whether the value of the population parameter is simply different from a given hypothesized value. The hypotheses for two‐tailed tests are stated as:
H0 : μ = μ0
Ha : μ ≠ μ0
Two‐tailed hypotheses tests have 2 rejection regions.
Rejection Rule for a Two-Tailed Hypothesis test
Reject H0 when:
Test statistic < Lower critical value
Test statistic > Upper critical value
Fail to reject H0 when:
Lower critical value ≤ test statistic ≤ Upper critical value
Hypothesis tests are used to make inferences about population parameters using sample statistics. There is always a possibility that the sample may not be perfectly representative of the population, and that the conclusions drawn from the test may be wrong. There are two types of errors that can be made when conducting a hypothesis test:
Type I error: Rejecting the null hypothesis when it is actually true.
Type II error: Failing to reject the null hypothesis when it is actually false.
Power of a test
The power of a test is the probability of correctly rejecting the null hypothesis when it is false.
Power of a test = 1 − P(Type II error)
Power of the test – 4 Important notes
1) The higher the power of the test, the better it is for purposes of hypothesis testing. Given a choice of tests, the one with the highest power should be preferred.
This statement is fairly straightforward—the test with the highest probability of rejecting the null hypothesis when it is false should be preferred.
2) Decreasing the significance level reduces the probability of Type I error. However, reducing the significance level means shrinking the rejection region, and inflating the “fail to reject the null region.” This increases the probability of failing to reject a false null hypothesis (Type II error) and reduces the power of the test.
3) The power of the test can only be increased by reducing the probability of a Type II error. This can only be accomplished by reducing the “fail to reject the null region,” which is equivalent to increasing the size of the rejection region and increasing the probability of a Type I error. Basically, an increase in the power of a test comes at the cost of increasing the probability of a Type I error.
4) The only way to decrease the probability of a Type II error given the significance level (probability of Type I error) is to increase the sample size.
Confidence interval vs. Hypothesis interval
- In a confidence interval, we aim to determine whether the hypothesized value of the population mean (μ 0), lies within a computed interval with a particular degree of confidence (1‐α). Here the interval represents the “fail‐to‐reject‐the‐null region” and is based around, or centered on the sample mean, x.
- In a hypothesis test, we examine whether the sample mean, x lies in the rejection region (i.e., outside the interval) or in the fail‐to‐reject‐the‐null region (i.e., within the interval) at a particular level of significance (α). Here the interval is based around, or centered on the hypothesized value of the population mean (μ 0).
p-value
The p‐value is the smallest level of significance at which the null hypothesis can be rejected. It represents the probability of obtaining a critical value that would lead to rejection of the null hypothesis.
Hypothesis testing: Distinguishing between statistical results and economically meaningful results
Sometimes differences between a variable and its hypothesized value are statistically significant but not practically or economically meaningful. Suppose we are testing a hypothesis that the returns from a technical trading strategy are greater than zero. If we use a large sample (n) when conducting the test, our standard error will be small, the “fail‐to‐reject region” narrower, and the greater the chance that the null will be rejected. The sample error decreases as sample size increases, and as sample size increases we can have situations where the null is rejected even when the sample mean deviates only slightly from the hypothesized value. Even though
a trading strategy might provide a statistically significant return of greater than zero (based on a hypothesis test) it does not mean that we can guarantee that trading on this strategy would result in economically meaningful positive returns. The returns may not be economically significant after accounting for taxes, transaction costs and risks inherent in the strategy.
Even if we conclude that a strategy’s results are economically significant, we should examine whether there is a logical reason to explain the apparently‐significant returns offered by the strategy before actually implementing it.
Hypothesis tests concerning a single mean
In the process of hypothesis testing, the decision whether to use critical values based on the z‐distribution or the t‐distribution depends on sample size, the distribution of the population and whether the variance of the population is known.
When is the t-test used?
The t‐test is used when the variance of the population is unknown and either of the conditions below holds:
- The sample size is large.
- The sample size is small, but the underlying population is normally distributed or approximately normally distributed.
The test statistic (t‐statistic) for hypothesis tests concerning the mean of a single population is: Used when variance is unknown
Why is the t-test popular?
In a t‐test, the sample’s t‐statistic is compared to the critical t‐value with n‐1 degrees of freedom, at the desired level of significance. Practically speaking, the variance of the population is rarely ever known, so the t‐test is very popular.
The z‐test: When & What
Can be used to conduct hypothesis tests of the population mean when the population is normally distributed and its variance is known.
The z‐test can also be used when the population’s variance is unknown, but the sample size is large.
Hypotheses describing the tests of means of two populations can be structured as:
- H0: μ1 – μ2 = 0 versus Ha: μ1 – μ2 ≠ 0 when we want to test if the two populations’ means are not equal.
- H0: μ1 – μ2 ≥ 0 versus Ha: μ1 – μ2 < 0 when we want to test if the mean of Population 1 is less than the mean of Population 2.
- H0: μ1 – μ2 ≤ 0 versus Ha: μ1 – μ2 > 0 when we want to test if the mean of Population 1 is greater than the mean of Population 2.
In tests where it is assumed that the variances of the two populations are equal, we use the pooled variance (s2p) in the calculation of the t‐stat. The test statistic, the pooled variance, and the degrees of freedom for the t‐test are calculated as follows:
In hypothesis tests where it is assumed that the variances of the two populations are unequal, the test statistic and the degrees of freedom for the t‐test are calculated as follows:
The table given below summarizes the important concepts that you should bear in mind from the examination perspective. We highly doubt that any question testing the ability of financial analysts would entail memorizing the complicated formulas above and performing tedious calculations. However, questions related to recognizing the test appropriate to verify the hypotheses, the test statistic, and drawing conclusions given the critical values and the test statistic are fair game. Hypothesis tests concerning the mean of two populations.
Sometimes we may need to perform tests on the variance of normally distributed populations. We use σ2to represent the population variance andσ02to denote the hypothesized value of the population variance.
Tests relating to the variance of normally distributed populations can be one‐tailed or two-tailed:
One-tailed tests:
H0 : σ2 ≤ σ20 versus Ha : σ2 > σ20
When testing whether the population variance is greater than the hypothesized variance
H0 : σ2 ≥ σ20 versus Ha : σ2 < σ20
When testing whether the population variance is lower than the hypothesized variance
Two-tailed tests:
H0 : σ2 = σ20 versus Ha : σ2 ≠ σ20
When testing whether the population variance is different from the hypothesized variance
Hypothesis tests for testing the variance of a normally distributed population involve the use of the chi‐square distribution, where the test statistic is denoted as χ2. Three important features of the chi‐square distribution are:
- It is asymmetrical.
- It is bounded by zero. Chi‐square values cannot be negative.
- It approaches the normal distribution in shape as the degrees of freedom increase.
The chi‐square test statistic is calculated as:
When is the F-Test used?
Hypotheses related to the equality of the variance of two populations are tested with an F‐ test. This test is used under the assumptions that:
- The populations from which samples are drawn are normally distributed.
- The samples are independent.
F-test: Hypothesis tests concerning the variance of two populations can be structured as one‐tailed or two‐tailed tests:
One__‐__tailed tests:
H0 : σ12 ≤ σ2 2 versus Ha : σ12 > σ22
H0 : σ12 ≤ σ22 versus Ha : σ12 < σ22
Two__‐__tailed tests:
H0 : σ12 = σ22 versus Ha : σ12 =/ σ22
σ12 = variance of Population 1
σ22 = variance of Population 2
The test statistic for the F‐test is given by:
Features of the F‐distribution:
- It is skewed to the right.
- It is bounded by zero on the left.
- It is defined by two separate degrees of freedom.
Which hypothesis tests do you use concerning the variance?
Variance of a single, normally distributed population
Chi‐square stat
Equality of variance of two independent, normally distributed populations
F‐stat
A parametric test has at least one of the following two characteristics:
It is concerned with parameters, or defining features of a distribution.
It makes a definite set of assumptions.
Non-parametric test:
A non‐parametric test is not concerned with a parameter, and makes only a minimal set of assumptions regarding the population.
non-parametric tests are used when:
- The researcher is concerned about quantities other than the parameters of the distribution.
- The assumptions made by parametric tests cannot be supported.
- When the data available is ranked (ordinal measurement scale). For example, non‐parametric methods are widely used for studying populations such as movie reviews, which receive one to five stars based on people’s preferences.