Ch8 - Confidence Intervals, Effect Size, and Statistical Power Flashcards

1
Q

What are the new statistics?

A
  1. Effect sizes
    1. Confidence intervals
    2. Meta-analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Point estimate

Confidence Intervals

A
  • A summary statistic from a sample that is just one number used as an estimate of the population parameter - “best guess”
  • The true population mean is unknown - and we take a sample from the population to estimate the population mean
  • EX: In studies on gender differences in math performance - the mean for boys, the mean for girls, and the difference between them, are point estimates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Interval estimate

Confidence Intervals

A
  • Based on a sample statistic and provides a range of plausible values for the population parameter
  • Frequently used by the media, often when reporting political polls, and are usually constructed by adding and subtracting a margin of error from a point estimate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the interval estimate composed of (EQUATION)?

Confidence Intervals

A

interval estimate = percentage, + and - the margin of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Confidence intervals: we’re not saying that we’re confident that the population mean falls in the interval, but rather…

Confidence Intervals

A

we are merely saying that we expect to find the population mean within a certain interval a certain percentage of the time - usually 95% - when we conduct this same study with the same sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Confidence level vs. interval:

Confidence Intervals

A
  • Level - the %
  • Interval - range between the two values that suround the sample mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Calculating confidence intervals with distributions

Confidence Intervals

A
  1. Draw a normal curve that has the sample mean at its center (NOTE: different from curve drawn for z test, where we had population mean at the center)
  2. Indicate the bounds of the confidence interval on the drawing
  3. Determine the z statistics that fall at each line marking the middle of 95%
  4. Turn the z statistics back into raw means
  5. Check that the confidence interval makes sense
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Step 1 to calculating CI

Confidence Intervals

A

Draw a normal curve that has the sample mean at its center (NOTE: different from curve drawn for z test, where we had population mean at the center)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Step 2 to calculating CI

Confidence Intervals

A
  • ** 2: Indicate the bounds of the confidence interval on the drawing**
  • Draw a vertical line from the mean to the top of the curve
  • For a 95% confidence interval we also draw two small vertical lines to indicate the middle 95% of the normal curve (2.5% in each tail, for a total of 5%)
  • The curve is symmetric, so half of the 95% falls above and half falls below the mean
  • Half of 95% = 47.5%, represented in the segments on either sides of the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Step 3 to calculating CI

Confidence Intervals

A

3. Determine the z statistics that fall at each line marking the middle of 95%

  • To do so: turn back to the z table
  • The % between the mean and each of the scores is 47.5% - when we look up this % in the z table, we find a statistic of 1.96
  • Can now add the z statistics of -1.96 and 1.96 to the curve
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Step 4 to calculating CI

Confidence Intervals

A

4. Turn the z statistics back into raw means

  • Need to identify appropriate mean and SD to use formula
  • Two important points to remember:
  • Center the interval around the sample mean (not the population mean), so use the sample mean in the calculation
  • Because we have a sample mean (rather than an individual score), we use a distribution of means - so we calculate standard error as the measure of spread:
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Step 5 to calculating CI

Confidence Intervals

A

5. Check that the confidence interval makes sense
* The sample mean should fall exactly in the middle of the two ends of the interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Statistically significant doesn’t/does mean…

A
  • Does NOT mean that the findings from a study represent a meaningful difference
  • ONLY means that those findings are unlikely to occur, in fact, if the null hypothesis is true
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does an increase in sample size affect SD and the test statistic? What dooes this cause?

The effect of sample size on statistical significance

A
  • Each time we increased the sample size, the SD decreased and the test statistic increased
  • Because of this, a small difference might not be statistically significant with a small sample but might be statistically significant with a large sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why would a large sample allow us to reject the null hypothesis than a small sample? (EXAMPLE)

A

If we randomly selected 5 women and they had a mean score well above the OkCupid average, we might say “it could be chance”; but if we randomly selected 1000 women with a mean rating well above the OkCupid average, it’s very unlikely that we just happened to choose 1000 people with high scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Effect size

A
  • Indicates the size of a difference and is unaffected by sample size
  • Can tell us whether a statistically significant difference might also be an important difference
  • Tells us how much two populations DO NOT overlap - the less overlap, the bigger the effect size
  • DECREASING OVERLAP IS IDEAL!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can the amount of overlap between two distributions be decreased? TWO WAYS:

A

1: overlap decreases and effect size increases when means are farther apart (distance wise)
2: overlap decreases and effect size increases when variability within each distribution of scores is smaller (height of peak)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How does effect size differ from statistical hypothesis testing?

A

Unlike statistical hypothesis testing, effect size is a standardized measure based on distributions of scores rather than distributions of means
* Rather than om = o/√N, effect sizes are based only on the variability in the distribution of scores and do not depend on sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Since effect sizes are not dependent on sample size, what does this allow us to do?

A

This means we can compare the effect sizes of different studies with each other, even when the studies have different sample sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When we conduct a z-test, the effect size is typically

A

Cohen’s D: a measure of effect size that expresses the difference between two means in terms of SD
* AKA, Cohen’s d is the standardized difference between two means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Formula for Cohen’s d for a z statistic:

A

d = (M - u)/o
- Similar to z statistic (om -> o, um -> u)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

With the results, we can determine (from Cohen’s 3 guidelines)…

Small, Medium, Large Effects

A
  • Small effects: 0.2 | 85% overlap
  • Medium effects: 0.5 | 67% overlap
  • Large effects: 0.8 | 53% overlap
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Does an effect need to be large to be meaningful?

A

Just because a statistically significant difference is small, that does not necessarily suggest no meaning; interpreting the meaningfulness of the effect sizes depends on the context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Meta-analysis:

Meta-analysis

A
  • a study that involves the calculation of a mean effect size from the individual effect sizes of more than one study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How do meta-analysis improve statistical power?

A

By considering multiple studies simultaneously and helps to resolve debates fueled by contradictory research findings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

4 steps to calculating meta-analyses:

A

1: select the topic of interest and decide exactly how to proceed before beginning to track down studies
2: locate every study that has been conducted and meets criteria
3: calculate an effect size, often Cohen’s d, for every study
4: calculate statistics - ideally, summary statistics, a hypothesis test, a confidence interval, and a visual display of the effect sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Considerations to keep in mind:

1: select the topic of interest and decide exactly how to proceed before beginning to track down studies

A
  • Make sure the necessary statistical information is available, either effect sizes of the summary stats necessary to calculate effect sizes
  • Consider selecting only studies in which participants meet certain criteria, such as age, gender, or geographic location
  • Consider eliminating studies based on the research design (EX: as they were not experimental in nature)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Key part involves finding…

2: locate every study that has been conducted and meets criteria

A

…any studies that have been conducted but not published
* Much of this “fugitive literature” or “gray literature” is unpublished simply because studies did not find a significant difference; the overall effect size seems larger without accounting these studies - AKA the “file drawer problem”
* Can find by using other sources - like contacting researchers to find unpublished work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

“File drawer problem” - 2 solutions

2: locate every study that has been conducted and meets criteria

A

1: File drawer analysis: a statistical calculation, following a meta-analysis, of the number of studies with null results that would have to exist so that a mean effect size would no longer be statistically significant
* If just a few studies could render a mean effect size nonsignificant (no longer statistically significantly different from zero) then the mean effect size should be viewed as likely to be an inflated estimate
* If it would take several hundred studies in researchers’ “file drawers” to render the effect non-significant, then it’s safe to conclude that there really is a significant effect

2: Can work with replication to help draw more reliable conclusions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What visual display can researchers include?

4: calculate statistics - ideally, summary statistics, a hypothesis test, a confidence interval, and a visual display of the effect sizes

A

Forest plot: type of graph which shoes the confidence interval for the effect size of every study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Statistical power is…

A

…the likelihood of rejecting the null hypothesis WHEN WE SHOULD reject the null hypothesis

32
Q

What is the probability that researchers consider the MINIMUM for conducting a study

Statistical power

A

0.80 - an 80% chance of rejecting the null if we should reject it
* Thus, they perform power analysis prior to conducting a study: if they have an 80% chance of correctly rejecting the null, then it’s appropriate to conduct the study

33
Q

When we conduct a statistical null hypothesis test, we make a decision to either reject or fail to reject the null hypothesis. One issue being that we don’t have direct access to the truth about what we’re studying - instead…

A
  • We make inferences based on the data we collected; which could be a right or wrong decision
  • Overall a researcher’s goal is to be correct as often as possible - 2 ways to be right, and 2 ways to be wrong
34
Q

What are 2 ways to be WRONG in rejecting/failing to reject the null hypothesis?

A

2 ways to be wrong - recap: Type I and Type II errors

35
Q

What are 2 ways to be RIGHT in rejecting/failing to reject the null hypothesis?

A

1 - Correct decision: if the null is true and we fail to reject the null, we have made the correct decision (essentially leaving the null alone)
- In this case, we’re saying that there’s no effect, when in fact there is none

2 - Correct decision (Power): if the null hypothesis is false, and we reject the null hypothesis, that’s also a correct decision
- A goal of research is to maximize statistical power

36
Q

Power is used by statisticians in a specific way - HOW?

A
  • Statistical power: a measure of the likelihood that we will reject the null hypothesis, given that the null hypothesis is false
  • In other words - statistical power is the probability that we will reject the null hypothesis when we should reject the null hypothesis; THE PROBABILITY THAT WE WILL NOT MAKE A TYPE II ERROR
37
Q

The calculation of statistical power ranges from:

A

Probability of 0.00 to 1.00 (AKA 0% to 100%)

38
Q

Conceptual calculation for power

A
  • Power = effect size x sample size
  • This means that we could achieve high power because the size of the effect is large - or we could achieve high power because the size of the effect is small, but it’s a large sample
39
Q

The most practical way to increase statistical power for many behavioural studies is…

A

…to add more participants

40
Q

How can researchers quantify the statistical power of their studies? 2 WAYS

A

1: By referring to a published table
2: By using computing tools like G*Power

41
Q

G*Power

A

Used in 2 ways:
1: Can calculate power AFTER conducting a study from several pieces of information
- Because we are calculating power after conducting the study, G*Power refers to these calculations as post hoc, meaning after the fact

2: Can use in reverse, BEFORE conducting a study, so as to identify the sample size necessary to achieve a given level of power
- In this case, G*Power refers to calculations as a priori, which means prior to

42
Q

Of the two, which of post hoc vs priori power is more meaningful?

A

post hoc power is NOT as meaningful as a priori power calculation for sample size planning

43
Q

On a practical level, statistical power calculations tell researchers…

A

…how many participants are needed to conduct a study whose findings we can trust

44
Q

Five factors that affect statistical power:

A

1: Increase alpha
2: Turn a two-tailed hypothesis into a one-tailed hypothesis
3: Increase N/sample size
4: Exaggerate the mean difference between levels of the IV
5: Decrease SD

45
Q

1: Increase alpha

Five factors that affect statistical power:

A
  • Like changing the rules by widening the goal posts in football, statistical power can increase when we increase an alpha level of 0.05
  • This has the side effect of increasing the probability of a Type I error from 5% to 10%
46
Q

2: Turn a two-tailed hypothesis into a one-tailed hypothesis

Five factors that affect statistical power:

A
  • One tailed tests provide more statistical power, while two-tailed tests are more conservative
  • However, best to use two-tailed
47
Q

3: Increase N/sample size

Five factors that affect statistical power:

A
  • Increasing sample size leads to an increase in the test statistic, making it easier to reject the null hypothesis
  • Increase => distribution of means become more narrow and there is less overlap (larger sample size means smaller standard error)
48
Q

4: Exaggerate the mean difference between levels of the IV

Five factors that affect statistical power:

A

The mean of population 2 is farther from the mean of population in part b) than it is in part a); difference in means is not easily changed, but can be done

49
Q

5: Decrease SD

Five factors that affect statistical power:

A

When SD is smaller, standard error is smaller and the curves are narrower

We can reduce SD in two ways:
1: by using reliable measures from the beginning of the study
2: by sampling from a more homogenous group in which participants’ responses are more likely to be similar to begin with

50
Q

LECTURES

51
Q

CHP8 concepts push beyond the limits of NHST

A
  • Effect size
  • Confidence intervals
  • Power
52
Q

Effect size:

CHP8 concepts push beyond the limits of NHST

A
  • If the null is really false, how big is that effect?
  • Standardized numerical estimate of the population effect size using our sample data
53
Q

Confidence intervals:

CHP8 concepts push beyond the limits of NHST

A
  • Starting with sample mean, compute a range of plausible values for the true population of the mean
  • Helps us prepare for replications
54
Q

Power

CHP8 concepts push beyond the limits of NHST

A
  • If the null is really false, how likely is it that we’re going to find a “significant effect” in our sample
  • If the null is really false, how likely is it that we’re going to avoid a type II error
55
Q

Effect size: after rejecting the null, we can conclude that…

A

we think we drew this sample from a different population with a different sampling distribution

56
Q

Effect size - How can we guess the population of the mean we drew from?

A

Calculating the tallest point in the distribution, most common score

57
Q

What does effect size look like, visually?

A

Distance from the highest peak of one distribution to the other (distance between group means)

58
Q

How/what does effect size help us estimate?

A

If we DID draw from a different population, how different is that new population’s mean from the null mean?

59
Q

What are some different ways to estimate an effect size for different kinds of data?

A
  • How far away is the true mean from the null hypothesis mean?
  • How far apart are the experimental and control conditions
  • Strength of correlation
  • How far from equal (50% each) is the distribution of proportions
60
Q

Which volume of effect is easiest to detect?

A

smaller effects are HARDER to detect from drawing a single sample; larger effects (thus, larger Cohen’s d) are EASIER to detect

61
Q

What is one of the many “standardized” indicators of effect size?

A
  • COHEN’S D
  • Estimates the population parameter - δ (delta)
62
Q

How many SD away from the comparison value is our sample group mean?

A
  • d = (M-u)/o
  • NOTE: equation is in-between absolute value bars
63
Q

What can we calculate to answer: “How likely is it that our class is a random sample from this general population? Or do we likely come from a different population?”

A

We can use a z test or we can use a confidence interval

64
Q

All CI’s follow the same pattern…

A

Subtract margin of error to find lower bound (critical value x standard error), add margin of error to find upper point (critical value x standard error)

65
Q

Type I error

A
  • H0 is reject
  • H0 is actually true
66
Q

Correct Decision POWER (1- β)

A
  • H0 is reject
  • H0 is false
67
Q

Type II error

A
  • Fail to reject H0
  • H0 is actually false
68
Q

Correct Decision (1 + α)

A
  • Retain H0
  • H0 is true
69
Q

What does β mean?

A

If the null is really false (effect exists), β% of the time we’re going to make a mistake and say there’s no effect

70
Q

To identify power:

A

find the % of the curve of the H1 distribution that would lead us to correctly reject the null

71
Q

If effect size increases, what happens to type I error rate?

72
Q

If effect size increases, what happens to type II error rate?

73
Q

If effect size increases, what happens to power?

74
Q

In priori power analysis, can ask two questions:

A
  • “If I’m making the assumption that there is an effect to be found, HOW MANY PEOPLE DO I NEED IN MY STUDY?”
  • “If I’m limited to N participants, will I have enough power to reject the null hypothesis if I should do so?”