Hypothesis Testing: Flashcards
What are some limitations of hypothesis testing?
- Difficult to understand what a hypothesis test is telling you
- Cannot make scientific decisions based on hypothesis testing alone
- Have to consider how plausible the result is
Is 59 heads from 100 throws evidence of an unfair coin or Random variation?
Usually you would expect only getting 50 heads, but I don’t believe 59 heads is far enough from this statistic in order to be significant enough to suggest the coin is unbiased (therefore, I believe it is due to random variation)
What is the threshold used to determine whether the chances of something happening are due to chance or not? How is it determined?
This threshold is called the significance level
It depends on the experiment
What is the usual significance level?
5%
What is used to denote the proportion of false positives if the null hypothesis is true (the significance level)
α (alpha)
How do you calculate the critical value?
The critical value is the number marking the point where, above or below which, is one or both extreme(s) of the distributions. This usually covers a certain percentage (5% coverage = 0.05 significance level)
List the names of the two hypothesis:
- H0 (null hypothesis)
- HA (alternative hypothesis)
What is the null hypothesis?
There is no significance difference- no effect (the initial assumption that we make)
What is an Alternative hypothesis?
Suggests there is a significance in the results and is an alternative theory to the null hypothesis
What is the relationship between the null hypothesis and the alternative hypothesis?
They are mutually exclusive (can’t happen at the same time)
How does hypothesis testing work to prove a hypothesis?
It assumes that the null hypothesis is true until there is significance proof that the null hypothesis is false (this doesn’t prove that the alternative hypothesis is correct)
What is the name given to the percentage area of a distribution marked by the critical value?
The critical region
What happens if a value given falls within the critical region?
We reject the null hypothesis and accept the alternative hypothesis as there is a less than 5% chance of the results happening by chance, suggesting that the result is statistically significant
How does the alternative hypothesis vary in a two tailed test?
The alternative hypothesis is just any result other than the null hypothesis (one or the other extreme)
What happens to the significance level during a two tailed test?
It is split into half of the original significance level as the critical region has to be split between both tails of the distribution while covering the same area
What does it mean when a result doesn’t fall within the critical region?
The hypothesis test does not suggest the null hypothesis is false- there is no statistical significance
What is the use of the critical value?
To determine whether the probability of getting the observed result is significant or not
What function is used to carry out binomial tests in R?
Binomial.test ( x , n , p )
What do x, n and p parameters in the binom.test ( x , n , p ) function stand for?
- x is the observed number of successes
- n is number of trials
- hypothesised probability of success from null hypothesis
What results are given when the binom.test ( x , n , p ) is used?
- A title
- A summary of data input
- Description or alternative hypothesis
- States p value (this is to be compared to significance level)
What is a p value?
The probability of getting a result that extreme or more, assuming the null hypothesis is true
What is an assumption made when using the p value?
The p value assumes that the null hypothesis is true
What is a type 1 error?
A false positive
What is it called if the null hypothesis is true and our results are not statistically significant (so we don’t reject the null hypothesis)?
A true negative
When can you get a false positive (type 1 error) in hypothesis testing?
When your results cause you to reject the null hypothesis but the null hypothesis is actually correct
What is a true positive in hypothesis testing?
When we reject the null hypothesis due to statistical significance and the null hypothesis is actually incorrect
What is a type two error?
A false negative
When do type two errors (false negatives) occur in hypothesis testing?
When the null hypothesis is not true but is not rejected
What is the definition of the power of a hypothesis test?
The probability of correctly rejecting the null hypothesis
What number is denoted by the letter β?
The area of a distribution where the null hypothesis would be rejected incorrectly
What is the equation for power?
Power = 1 - β
What is β the probability of??????
Getting false negatives- the overlapping area between two distributions
How do you work out the probability of getting a false negative, true positive, true negative and false positive from a distribution?
. . .
What can increase how powerful a statistical test of two overlapping distributions is?
- The peaks of two distributions are well separated/ far apart
- Large spread of values between the two distributions (how separate they are)
How does a large spread of values across the horizontal axis make the hypothesis test more powerful?
There is little overlap between the distributions of both the null hypothesis and alternative hypothesis (each represented by a distribution) which has to be taken away from 1 to work out the hypothesis testing power
What is a high false positive rate?
A distribution where we are more likely to accept the null hypothesis even though it is wrong
What is effect size?
The combination effect of the difference between the peaks and spread of two distribution curves
What effect does a larger effect size have on detecting the difference between distributions with similar peaks/ more overlaps but where the null hypothesis is wrong? Give an example
Making it easier to detect a difference distributions where the null hypothesis is incorrect but the results only differ from it slightly.
Eg. It is easier to detect a biased coin which lands on heads 80% of the time than that of 60% of the time in hypothesis testing
What is an effect of a larger effect size on the power of a hypothesis test?
It provides a more powerful hypothesis test
How can the spread of a distribution be increased to make the hypothesis test more powerful?
By increasing the number of trials, which increases the range of spread of data between the distributions as both have to cover a larger range of data in the horizontal axis (while the number of outcomes stays proportional). This means there is a smaller area of overlap
What is a disadvantage of increasing the number of trials in order to increase the power of hypothesis testing?
This can be tedious, costly and time consuming
What effect size would result in the most statistically powerful experiment?
Low dispersion and high difference between peaks
What is the equation for effect size?
Dispersion (spread) + peak difference
When are dispersion spread and number of trials more closely linked during hypothesis testing?
During binomial distribution
What are standard significance levels?
- 0.05
- 0.01
- 0.001