Chapter 13: Null Hypothesis Significance Testing Flashcards
What is a goal of statistical evaluation?
To provide an objective or at least agreed-upon criterion to decide whether the results we obtained are sufficiently compelling to reject the null hypothesis.
What is a statistically significant effect?
This indicates that the probability level is equal to or below the level of confidence selected.
What is the use of statistical evaluation?
It provides consistent criteria for determining whether an effect is to be considered veridical.
How is statistical significance a function or measure of sample size?
The larger the sample size, the smaller the group differences needed for statistical significance for a given level of confidence.
What is the minimum required power of a study?
At least .80
- If power is .80, this means that the investigator’s chance of detecting a difference in their study is 4 out of 5, if there is a real difference in the population.
Why are underpowered studies a problem for the field as psychology?
It leads researchers to believe that the topic was studies and that a reasonable answer came out of it. As a result, researchers might quit investigating this area, even though there is an effect to be found.
How can you identify your needed sample size?
Fix the parameters of:
1. Alpha
2. Power
3. ES
How can we decide on an effect size?
- You can consult meta-analyses to identify a likely ES for the study you want to undertake.
- If meta-analyses are unavailable, you can consult one or more studies that have compared the groups of interest or that used the measures (DV) in a related way.
- Another study on the topic or closely related can be consulted.
- When meta-analyses and individual studies are unavailable, ES can be estimated on a priori grounds.
- It is helpful to select a conservative estimate.
What is the difference between n and N?
N refers to the overall sample for the entire study, while n refers to the sample size within each condition.
Why do methodologists advocate attention to power?
- There is a long tradition of most studies being underpowered to detect the likely effects their manipulations will show.
- Power is easy to calculate given statistical software.
What is the error variance?
The heterogeneity of the subjects (as reflected in within-group variance).
- This is directly related to ES and statistical significance.
- The larger the error variance, the less likely the results will be statistically significant.
What are ways to increase power?
- Increase the sample size (N or n/group).
- Increase expected differences by contrasting conditions that are more likely to vary (stronger manipulations, sharper contrasts).
- Use pretests/repeated measures that reduce the error term in the effect size.
- Vary alpha (a priori) if the case can be made to do so if, e.g.:
- Classification of groups is imperfect.
- Measures are not well-established.
- Small effects/differences are predicted.
- Consequences of the decision vary markedly as a function of the direction and hence we wish to detect difference in one direction rather than another.
- Use directional tests for significance testing,
- Decrease variability in the study as possible by:
- Holding constant vs controlling sources of variability.
- Analyzing the data to extract systematic sources of variance from the error term.
How can you set up conditions that might show a stronger effect?
By comparing groups that are more extreme, since that might show a stronger effect.
- E.g. comparing none vs. a lot, rather than a bit vs a lot.
What are statistical advantages of a pretest?
With various analyses, the error term in evaluating ES is reduced.
- With repeated assessment of the subjects, the within-group variance can be taken into account to reduce the error term.
What circumstances may lead the investigator to anticipate specific constraints that will reduce the likely ES and the differences between groups or conditions?
- The criterion for selecting groups in a case-control study might be known to be imperfect or somewhat tenuous.
- Thus, some persons in one group might, through imperfect classification, belong in the other group. Imperfect classification is analogous to diffusion of treatment as a threat to internal validity.
- The measures in the area of research may not be very well established. This may introduce variability into the situation that will affect the sensitivity of the experimental test. The predicted relation might be more evident with a more sensitive/reliable measure.
- Thus, it might be reasonable to have a less stringent p-value.
- The specific comparison of interest may be expected to generate a small difference between groups. Usually, a large sample size is required to find this effect, but we might not have access to that. Thus, a lower p-value might be useful.
- We might alter alpha based on consideration of the consequences of our decisions.