Part 6. Hypothesis Testing Flashcards
Hypothesis testing
The part of statistical inference, the process of making judgments about a larger group (a population) based on a smaller group of observations (sample).
Definition:
We test to see whether a sample statistic is likely to come from a population with the hypothesized value of the population parameter.
Statistical inference
- Estimation
2. Hypothesis testing
Hypothesis
A statement about one or more populations that we test using sample statistics.
The process of hypothesis testing
- State the hypothesis.
- Identify the appropriate test statistic.
- Specify the level of significance.
- State the decision rule.
- Collect data and calculate the test statistics.
- Make a decision.
Null hypothesis (H0)
A statement concerning a population parameter or parameters considered to be true, unless the sample we use to conduct the hypothesis test gives convincing evidence the null hypothesis is false.
The statement we want to reject, in favour of alternative hypothesis, Ha.
Level of significance
The probability of Type I error in testing hypothesis denoted by alpha, a.
i.e. probability of incorrectly rejecting the true H0.
Confidence level
The complement of level of significance; 1-a.
i.e. 5% probability of rejecting a true H0, corresponds to 95% confidence level.
Type 1 and 2 error dilemma
- Both errors involve a trade off
- If we decrease the probability of Type 1 error by specifying a smaller significance level (1% instead of 5%), we increase the probability of making a Type 2 error as will reject H0 less frequently.
- To reduce the probability of both types of errors simultaneously is to increase sample size, n.
Power of test
The probability of correctly rejecting the null, the probability of rejecting the null when it is false.
The complement of Type 2 error.
Critical values
The action of comparing calculated test statistic with specified value or values.
Statistically significant
Finding result of calculated value of test statistic is more extreme than critical value/values, we reject H0.
Collecting data considerations:
- Ensure that the sampling procedure does not include biases, such as sample selection or time bias.
- We need to cleanse the data, checking inaccuracies and other measurement errors in data.
Once assured sample is unbiased and accurate, the sample info is used to calculate the appropriate test statistic.
Make a decision
- Statistical decision:
- consider a test of mean risk premium, comparing population mean with zero using bounds. - Economic decision:
- Considers statistical decision, but all pertinent economic issues, i.e. reject H0 that risk premium is zero for greater than zero. This is economically meaningful that investor commit funds to US equities.
- Non-statistical considerations such as investors tolerance for risk and financial position.
- Statistical significance, not economic?
- The smaller the standard error of mean, the larger value of t-stat and greater chance H0 rejected all else equal.
- Standard error decreases as sample size, n, increases, so that for large samples we can reject H0 for small departures from it.
- There is a statistically positive mean return, results may not be economically significant when accounting for transaction costs, taxes, and risk.
P-Value
The area in the probability distribution outside the calculated test statistic, for a 2-sided test this is the area outside +- the calculated test statistic.
For one sided test, this is the area outside the calculated test statistic on the appropriate side of the probability distribution.
The p value is the smallest level of significance at which the null hypothesis can be rejected.
P-value
The smallest level of significance at which null hypothesis can be rejected.
The area in the probability distribution outside the calculated test statistic for a two-sided test, the area outside +- the calculated test statistic.
For one sided test, this is the area outside the calculated test statistic on the appropriate side of probability distribution.
Why are p-values not completely uniform?
- The p-values for true H0 are generally uniformly distributed between 0% and 100%, as under H0 there is a 5% chance of p-values < 5%, and 10% chance < 10%.
- If we took 1000 samples of 50, taking more samples/ larger samples means more uniform distribution of p-values.
- p-values for false H0, there is no uniform distribution, but a peak around 0% and little elsewhere.
- There is a difference in p-values for 2 false hypothesised means of 6.5% and 7%, showing further the false hypothesis is away from truth, the greater the power of test and better ability to detect false hypothesis.
False discovery rate (FDR)
The expected portion of false positives, where this result the H0 is rejected even though H0 is true.
Multiple testing problem
- Drawing a 1000 samples of 50 observations each, there are samples in which we reject the true H0 of population mean 6%.
- If we draw enough samples with level of significance of 0.05, the approx. 0.05 of time you reject H0 even if its true.
- If you run 100 tests, and use 5% level of significance, you will get 5 false positives on average.
Paired observations
Observations that are dependent as they have something in common.
e.g. a dividend policy of companies before and after a change in the tax law affecting taxation of dividends.
Pairs of observations for same companies, dependent samples as pairs of samples before and after tax law change.
Parametric test
Any test/procedure with either characteristic of:
- Concerned with parameters
- Validity depends on a definite set of assumptions
Non-parametric test
A test that is not concerned with a parameter or test that makes minimal assumptions about the population from which the sample comes.
Use of non-parametric procedures
- when data we use do not meet distributional assumptions
- when there are outliers
- when the data are given in ranks or use an ordinal scale
- when the hypotheses we are addressing do not concern a parameter.
Non-parametric characteristics:
- We must refer to specialised statistical tables to determine the rejection points of the test statistic, at least for small samples.
- Although the underlying distribution of population may be normal, there may be extreme values or outliers that influence parametric statistics, but not non-parametric stats.
e. g. we may want to use nonparametric test of median, in case of outliers we may test the mean. - A sample in which observations are ranked, we use nonparametric tests as parametric tests require stronger measurement scale than ranks.
- When our question does not concern a parameter, e.g. if the question concerns whether sample is random or not, we use ‘runs test’.
e. g. runs test used
Significance test of correlation coefficient
Allows us to assess whether the relationship between 2 random variables is the result of chance.
Correlation coefficient
The number between -1 and +1, where -1 denotes perfect negative or inverse relation between variables, +1 denotes a perfect positive relationship, 0 is an absent relationship.
What is the parametric pairwise correlation coefficient often referred to as:
Pearson correlation, bivariate correlation, or simply the correlation.
What does magnitude of r needed to reject H0 decrease as sample size n increases?
- As n increases, no. of df increases, and absolute value of CV of t-stat decreases.
- The absolute value of numerator increases with larger n, resulting in larger magnitude of calculated t-stat.
e. g. n = 12, r=0.35, t-stat = 1.182, not different from zero at 0.05 level (ta/2 = +-2.228).
n=32, r = 0.35 yields t-stat 2.046 sign level 0.05 (ta/2 = +-2.042)
Spearman rank correlation coefficient
The equivalent to the usual correlation coefficient but is calculated on the ranks of the 2 variables with their respective samples.