Sec 65-66 Effect Size (d), Interpretation of Effect Size (d) Flashcards
Effect Size (d)
_EFFECT SIZE (*d* )_ refers to the MAGNITUDE (i.e. size) of a DIFFERENCE expressed on a STANDARDIZED SCALE.
- When you’re working with two groups of data that use different scales on their RAW data, you must STANDARDIZE both sets of data to a COMMON SCALE in order to COMPARE THEM.
- The MAGNITUDE of the DIFFERENCE BETWEEN these scores on this standardized, shared scale is the EFFECT SIZE.
- Ex: Suppose that Experimenter A administered a new “Treatment X” for depression to an experimental group, while a control group received a standard treatment. At the end of the treatment, both groups were given a self-report test (with possible raw scores from 0 to 20) to measure their depression level. (See table 1).
- Also, suppose that Experimenter B similarly administered a DIFFERENT “Treatment Y” for depression to an experimental group, while a control group received a standard treatment. At the end of the treatment, both groups were given a self-report test (with possible raw scores from 0 to 120) to measure their depression level. (See table 2).
-
Which treatment is superior?
- Treatment X, which resulted in a 5-point raw score difference between the two means
- Treatment Y, which resulted in a 10-point raw score difference between the two means?
- Because the experimenters used different measurement scales (0 to 20 versus 0 to 120), the answer is not clear.
-
To compare the outcomes, we have to make the two studies COMPARABLE (on the same scale). We can do this by STANDARDIZING each set of outcomes using the STANDARD DEVIATION.
- In Experiment A, one standard deviation unit equals 4.00 raw-score points. Standardize this by:
- Take the difference in the means (12.00 - 7.00 = 5.00)
- Divide that difference by the Standard Deviation (5.00 / 4.00 = 1.25).
- That 1.25 represents the number of standard deviations above zero.
-
INTERPRETATION: For all practical purposes, there are only three standard-deviation units on each side of the mean. Thus, d is expressed in standard deviation units and has an effective range from 0.00 (no difference between the means) to 3.00 (the maximum difference between the means).
- Thus, for Experiment A, the experimental group is 1.25 standard-deviation units above no difference (0.00) on a standardized scale that ranges from 0.00 to 3.00. On such a limited scale, a value of d of 1.25 indicates that the difference is substantially above 0.00.
- BUT our goal is to compare EXPERIMENT A to EXPERIMENT B to see which experiment (if any) produced the new treatment (X or Y) with the best results over the standard (control) treatment.
- So we must do the same process for Experiment B.
- So, standardize Experiment B.
- (80.00 - 70.00) / 14.00 = d = 0.71, or 0.71 Standard Deviations above 0.00.
- So, standardize Experiment B.
- In Experiment A, one standard deviation unit equals 4.00 raw-score points. Standardize this by:
- Now that the two experiments have been standardized:
- Experiment A (Treatment X vs Standard Treatment) = d = 1.25.
- Experiment B (Treatment Y vs Standard Treatment) = d = 0.71.
- We can clearly see that Treatment X compared much more favorably to the Standard Treatment than did Treatment Y. So Treatment X is the more effective treatment.
NOTE: WITHIN each EXPERIMENT in this section, the two standard deviations (for the new treatment and the standard treatment) are EQUAL.
- When they are UNEQUAL, a special AVERAGING PROCEDURE that results in the POOLED STANDARD DEVIATION should be used. (See Appendix I).
Interpretation of Effect Size
EVALUATING the SIZE EFFECT – The example in the other card produced values of d of 0.71 and 1.25. Obviously, the experiment with a value of 1.25 had a LARGER EFFECT than the one with a value of 0.71. But how would we interpret the effect if we were looking at only a single Experiment? Is 1.25 really such a good thing?
- Researchers tend to use the subjective evaluation of d suggested by Cohen (1992) (See Table 1. below)
- d = 0.20 >> small effect
- d = 0.50 >> medium effect
- d = 0.80 >> large effect
- d = 1.10 >> very large effect
- d = 1.40+ >> extremely large effect
- Values as large as 1.10 or 1.40+ are RARELY found in social and behavioral research.
- So 0.71 would be described as closer to large than to medium, while the value of 1.25 would be described as between very large and extremely large.
SIZE does NOT necessarily indicate IMPORTANCE – Contex always matters:
- A SMALL EFFECT SIZE might represent an IMPORTANT RESULT.
-
Ex: Suppose researchers trying various treatments for a new and deadly disease are frustrated because they keep finding values of d near zero. If a subsequent treatment produces a value of d = 0.20 (considered ‘small’), that could be considered a very important finding and might be the difference between life and death for many people.
- In addition, the results might point the scientific community in a fruitful direction for additional research on treatments for the problem in question.
-
Ex: Suppose researchers trying various treatments for a new and deadly disease are frustrated because they keep finding values of d near zero. If a subsequent treatment produces a value of d = 0.20 (considered ‘small’), that could be considered a very important finding and might be the difference between life and death for many people.
- A LARGE EFFECT SIZE might represent an UNIMPORTANT RESULT.
- A large value of d might be of limited importance. This usually occurs when the results lack practical significance in terms of cost, acceptability, and ethical and legal concerns.
- Ex: Magnetic Levitation car suspensions are FAR superior to traditional suspensions, but they are never seen in cars because they are way to heavy and far to expensive for the average consumer.
- A large value of d might be of limited importance. This usually occurs when the results lack practical significance in terms of cost, acceptability, and ethical and legal concerns.
THREE STEPS for INTERPRETING the DIFFERENCE BETWEEN TWO MEANS:
-
DETERMINE whether the DIFFERENCE is STATISTICALLY SIGNIFICANT at an acceptable probability level, such as p < . 05. using a t -Test.
- If it is not, the difference usually should be regarded as unreliable and should be interpreted as such.
- For a STATISTICALLY SIGNIFICANT DIFFERENCE, consider the value of d and the Magnitude labels in Table 1. Was the effect ‘medium’? ‘Large’?
- CONSIDER the IMPLICATIONS of the DIFFERENCE for validating any relevant theories as well as the practical significance of the results.