Week 6 Chapter 8 Flashcards
Step 1 of The Four Steps of a Hypothesis Test
all of these:
- statement of the hypothesis
- State the hypotheses and select an alpha level.
Step 2 of The Four Steps of a Hypothesis Test
all of these:
- setting of the criteria for a decision
- Locate the critical region.
Step 3 of The Four Steps of a Hypothesis Test
all of these:
- collection of data and computation of sample statistics
- Compute the test statistic (the z-score).
Step 4 of The Four Steps of a Hypothesis Test
decision making (about the null hypothesis)
hypothesis testing
statistical method that uses sample data to evaluate a supposition about a population
null hypothesis
all of these:
- states that in the general population there is no change, no difference, or no relationship
- The null hypothesis states that the treatment has no effect.
- is identified by the symbol H0 . (The H stands for hypothesis, and the zero subscript indicates that this is the zero-effect hypothesis.)
alternative hypothesis
all of these:
- states that there is a change, a difference, or a relationship for the general population
- is also called scientific hypothesis
- is identified by the symbol H1 . (The H stands for hypothesis, and the 1 subscript indicates that the treatment has an effect)
alpha level
all of these:
- probability value that is used to define the concept of “very unlikely”
- is the small probability that the test will lead to a Type I error. That is, the alpha level determines the probability of obtaining sample data in the critical region even though the null hypothesis is true.
critical region
group of extreme sample values very unlikely to be obtained if null hypothesis is true
test statistic
indicates that the sample data are converted into a single figure to test a hypothesis
Type I error
occurs when a researcher rejects a null hypothesis that is actually true. In a typical research situation, a Type I error means the researcher concludes that a treatment does have an effect when in fact it has no effect.
Type II error
all of these:
- occurs when a researcher fails to reject a null hypothesis that is in fact false
- In a typical research situation, a Type II error means that the hypothesis test has failed to detect a real treatment effect.
beta
probability of a Type II error
significant
result that is very unlikely to occur when the null hypothesis is true
directional hypothesis test
all of these:
- method wherein statistical suppositions specify either an increase or a decrease in the population mean
- or a one-tailed test, the statistical hypotheses (H0 and H1 ) specify either an increase or a decrease in the population mean. That is, they make a statement about the direction of the effect.
effect size
measurement of the absolute magnitude of a treatment result, independent of the size of the sample(s) being used.
Cohen’s d
all of these:
- measure of the distance between two means, typically reported as a positive number even when the formula produces a negative value.
- Cohen’s d = mean difference of Mtreatment - μnotreatment / standard deviation
- sample size is not considered when computing Cohen’s
power
probability that the test will correctly reject a false null hypothesis. That is, power is the probability that the test will identify a treatment effect if one really exists.
the researcher begins with a known population. This is the set of individuals as they exist before treatment. For this example, we are assuming that the original set of scores forms a normal distribution. The purpose of the research is to determine the effect of a treatment on the individuals in the population. That is, the goal is to determine what happens to the population (whether the treatment has an effect on the population mean)
after the treatment is administered.
The goal of the hypothesis test is to determine
whether the treatment has any effect on the individuals in the population
the unknown population, after treatment, is the
focus of the research question
the purpose of the research is to determine what would happen if the treatment were administered to
every individual in the population
the unknown population is actually
hypothetical (the treatment is never administered to the entire population). Instead, we are asking what would happen if the treatment were administered to the entire population.
The null hypothesis and the alternative hypothesis are mutually exclusive and exhaustive. They cannot both be true. The data will determine which one should be
rejected
if there is a big discrepancy between the data and the hypothesis, we will conclude that the hypothesis is
wrong
The alpha level, or the level of significance, is a probability value that is used to define the concept of
“very unlikely” in a hypothesis test.
The critical region is composed of the extreme sample values that are very unlikely (as defined by the alpha level) to be obtained if the null hypothesis is true. The boundaries for the critical region are determined by the alpha level. If sample data fall in the critical region,
the null hypothesis is rejected.
the data are collected after the researcher has stated the hypotheses and established the criteria for a decision. This sequence of events helps ensure that a researcher makes an honest, objective evaluation of the data and does not
tamper with the decision criteria after the experimental outcome is known.
the heart of the hypothesis test:
comparing the data with the hypothesis
In most research situations, the consequences of a Type I error can be very serious. Because the researcher has rejected the null hypothesis and believes that the treatment has a real effect, it is likely that the researcher will report or even publish the research results. A Type I error, however, means that this is a false report. Thus, Type I errors lead to
false reports in the scientific literature. Other researchers may try to build theories or develop other experiments based on the false results. A lot of precious time and resources may be wasted.
A Type I error occurs when a researcher unknowingly obtains an extreme, nonrepresentative sample. Fortunately, the hypothesis test is structured to
minimize the risk that this will occur
A Type II error occurs when the sample mean is not in the critical region even though the treatment has an effect on the sample. Often this happens when the effect of the treatment is relatively small. In this case, the treatment does influence the sample, but the magnitude of the effect is not big enough to move the sample mean into the critical region.
Because the sample is not substantially different from the original population (it is not in the critical region), the statistical decision is to fail to reject the null hypothesis and to conclude that there is not enough evidence to say there is a treatment effect.
The consequences of a Type II error are usually not as serious as those of a Type I error. In general terms, a Type II error means that the research data do not show the results that the researcher had hoped to obtain. The researcher can accept this outcome and conclude that the treatment either has no effect or has only a small effect that is not worth pursuing, or the researcher can
repeat the experiment (usually with some improvement, such as a larger sample) and try to demonstrate that the treatment really does work.
Unlike a Type I error, it is impossible to determine a single, exact probability for a Type II error. Instead, the probability of a Type II error depends on
a variety of factors and therefore is a function, rather than a specific number. Nonetheless, the probability of a Type II error is represented by the symbol β, the Greek letter beta.
By convention, the largest permissible value is
α = 0.05. When there is no treatment effect, an alpha level of .05 means that there is still a 5% risk, or a 1-in-20 probability, of rejecting the null hypothesis and committing a Type I error. Because the consequences of a Type I error can be relatively serious, many individual researchers and many scientific publications prefer to use a more conservative alpha level such as
.01 or .001 to reduce the risk that a false report is published and becomes part of the scientific literature. However, there is a different kind of risk that develops as the alpha level is lowered. Specifically, a lower alpha level means less risk of a Type I error, but it also means that the hypothesis test demands more evidence from the research results.
The trade-off between the risk of a Type I error and the demands of the test is controlled by the boundaries of the critical region. For the hypothesis test to conclude that the treatment does have an effect, the sample data must be in the
critical region. If the treatment really has an effect, it should cause the sample to be different from the original population; essentially, the treatment should push the sample into the critical region. However, as the alpha level is lowered, the boundaries for the critical region move farther out and become more difficult to reach.
an extremely small alpha level, such as .000001 (one in a million), would mean almost no risk of a Type I error but would push the critical region so far out that it would become essentially impossible to ever reject the null hypothesis; that is, it would require an enormous treatment effect before the sample data would reach the critical boundaries. In general, researchers try to maintain a balance between
the risk of a Type I error and the demands of the hypothesis test. Alpha levels of .05, .01, and .001 are considered reasonably good values because they provide a low risk of error without placing excessive demands on the research results.
What is the consequence of increasing the alpha level (for example, from .01 to .05)
It will increase the likelihood of rejecting H0(aka null hypothesis) and increase the risk of a Type I error.
The APA style does not use a leading zero in a probability value that refers to
a level of significance
In general, increasing the variability of the scores produces
a larger standard error and a smaller value (closer to zero) for the z-score. If other factors are held constant, the larger the variability, the lower the likelihood of finding a significant treatment effect.
In general, decreasing the number of scores in the sample produces
a larger standard error and a smaller value (closer to zero) for the z-score. If all other factors are held constant, a larger sample is more likely to result in a significant treatment effect.
Independent observations are a basic requirement for
nearly all hypothesis tests
A research report includes the statement, “z = 2.18, p < .05”
The obtained sample mean was very unlikely if the null hypothesis is true, so H0 was rejected.
A sample of n = 4 individuals is selected from a population with μ = 80 and σ = 5, and a treatment is administered to the sample. If the treatment really does have an effect, then what would be the effect of increasing the sample size to n = 25?
Increase the chances that the sample will produce an extreme z-score and increase the likelihood that you will conclude that a treatment effect exists
What assumption is needed before you can use the unit normal table to find critical values for a z-score hypothesis test?
The distribution of sample means is normal.
The hypothesis-testing procedure presented in Sections 8.2 and 8.3 is the standard, or two-tailed, test format. The term two-tailed comes from the fact that the critical region is divided between the two tails of the distribution. This format is by far the most widely
accepted procedure for hypothesis testing. Nonetheless, there is an alternative that is discussed in this section.
Notice that a directional (one-tailed) test requires two changes in the step-by-step hypothesis-testing procedure.
- In the first step of the hypothesis test, the directional prediction is incorporated into the statement of the hypotheses.
- In the second step of the process, the critical region is located entirely in one tail of the distribution.
The major distinction between one-tailed and two-tailed tests is in the criteria they use for rejecting H0.
A one-tailed test allows you to reject the null hypothesis when the difference between the sample and the population is relatively small, provided the difference is in the specified direction. A two-tailed test, on the other hand, requires a relatively large difference independent of direction.
In general, two-tailed tests should be used in research situations when there is no strong directional expectation or when there are two competing predictions. For example, a two-tailed test would be appropriate for a study in which one theory predicts an increase in scores but another theory predicts a decrease. One-tailed tests should be used only in situations when the directional prediction is made
before the research is conducted and there is a strong justification for making the directional prediction. In particular, if a two-tailed test fails to reach significance, you should never follow up with a one-tailed test as a second attempt to salvage a significant result for the same data.
A population is known to have a mean of
μ = 50. A treatment is expected to increase scores for individuals in this population. If the treatment is evaluated using a one-tailed hypothesis, then which of the following is the correct statement of the null hypothesis?
μ <= 50 (less than or equal to 50)
A researcher is conducting an experiment to evaluate a treatment that is expected to decrease the scores for individuals in a population which is known to have a mean of μ = 80. The results will be examined using a one-tailed hypothesis test. Which of the following is the correct statement of the alternative hypothesis (H1)?
μ < 80
A researcher expects a treatment to produce an increase in the population mean. Assuming a normal distribution, what is the critical z-score for a onetailed test with α = 0.01?
z = +2.33
The primary concern about hypothesis testing is that
demonstrating a significant treatment effect does not necessarily indicate a substantial treatment effect. In particular, statistical significance does not provide any real information about the absolute size of a treatment effect. Instead, the hypothesis test has simply established that the results obtained in the research study are very unlikely to have occurred if there is no treatment effect.
the hypothesis test is making a relative comparison: the size of the treatment effect is being evaluated relative to the standard error. If the standard error is very small, then the treatment effect can also be very small and still be large enough to be significant. Thus, a significant effect does not necessarily mean a
big effect
The standard deviation is included in the calculation to standardize the size of the mean difference in much the same way that z-scores standardize locations in a distribution. For example, a 15-point mean difference can be a relatively large treatment effect or a relatively small effect depending on the
size of the standard deviation.
Cohen’s d measures the size of the treatment effect in terms of the
standard deviation. For example, a value of d = 0.50 indicates that the treatment changed the mean by half of a standard deviation; similarly, a value of d = 1.00 indicates that the size of the treatment effect is equal to one whole standard deviation.
Evaluating effect size with Cohen’s d
Magnitude of d and evaluation of effect size
d = 0.2 Small effect (mean difference around 0.2 standard deviation)
d = 0.5 Medium effect (mean difference around 0.5 standard deviation)
d = 0.8 Large effect (mean difference around 0.8 standard deviation)
Under what circumstances can a very small treatment effect be statistically significant?
With a large sample and a small standard deviation
If other factors are held constant, then how does sample size affect the likelihood of rejecting the null hypothesis and the value for Cohen’s d?
A larger sample increases the likelihood of rejecting the null hypothesis but has no effect on the value of Cohen’s d.
Instead of measuring effect size directly, an alternative approach to determining the size or strength of a treatment effect is to measure the
power of the statistical test
Whenever a treatment has an effect, there are only two possible outcomes for a hypothesis test
- Fail to reject the null hypothesis.
2. Reject the null hypothesis.
The first outcome, failing to reject H0 when there is a real effect, was defined earlier as a Type II error. The second outcome, rejecting H0 when there is a real effect, is defined as the power of the test. Because there are only two possible outcomes, the probability for the first and the probability for the second must add up to
1.00. We have already identified the probability of a Type II error (outcome 1) as p = β. Therefore, the power of the test (outcome 2) must be p = 1 - β
Researchers typically calculate power as a means of determining whether a research study is likely to be successful. Thus, researchers usually calculate the power of a hypothesis test
before they actually conduct the research study. In this way, they can determine the probability that the results will be significant (reject H0) before investing time and effort in the actual research.
In general, as the effect size increases, the distribution of sample means on the right-hand side moves
even farther to the right so that more and more of the samples are beyond the boundary z = 1.96. Thus, as the effect size increases, the probability of rejecting H0 also increases, which means that the power of the test increases. Thus, measures of effect size such as Cohen’s d and measures of power both provide an indication of the strength or magnitude of a treatment effect.
Although the power of a hypothesis test is directly influenced by the size of the treatment effect, power is not meant to be a
pure measure of effect size. Instead, power is influenced by several factors, other than effect size, that are related to the hypothesis test. Some of these factors are sample size, alpha level, one-tailed versus two-tailed
Before a study is conducted, researchers can compute power to determine the probability that their research will successfully reject the null hypothesis. If the probability (power) is too small, they
always have the option of increasing sample size to increase power.
Reducing the alpha level for a hypothesis test also
reduces the power of the test. For example, lowering
α from .05 to .01 lowers the power of the hypothesis test. The effect of reducing the alpha level can be seen by looking at Figure 8.9. In this figure, the boundaries for the critical region are drawn using α = .05. Specifically, the critical region on the right-hand side begins at z = +1.96. If α were changed to .01, the boundary would be moved farther to the right, out to z = 2.58. It should be clear that moving the critical boundary to the right means that a smaller portion of the treatment distribution (the distribution on the right-hand side) will be in the critical region. Thus, there would be a lower probability of rejecting the null hypothesis and a lower value for the power of the test.
Tests Changing from a regular two-tailed test to a one-tailed test
increases the power of the hypothesis test. Again, this effect can be seen by referring to Figure 8.9. The figure shows the boundaries for the critical region using a two-tailed test with α = .05 so that the critical region on the right-hand side begins at z = 1.96 . Changing to a one-tailed test would move the critical boundary to the left to a value of z = 1.65. Moving the boundary to the left would cause a larger proportion of the treatment distribution to be in the critical region and, therefore, would increase the power of the test.
If the power of a hypothesis test is found to be p = 0.80, then what is the probability of a Type II error for the same test?
p = 0.20
How does the sample size influence the likelihood of rejecting the null hypothesis and the power of the hypothesis test?
Increasing sample size increases both the likelihood of rejecting H0 and the power of the test.
How is the power of a hypothesis test related to sample size and the alpha level?
A larger sample and a larger alpha level will both increase power.