Exam 2: Inferential Statistics Flashcards

1
Q

p value

A

Probability that your observed result (or a more extreme one) came from the distribution of your null hypothesis

p value = the probability that was is pulled from this unlikely scenario

p = 0.06 or p=.1 marginal affects

how extreme is this value- how likely was this by chance?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

One-Tail or Two-Tail?

A

include if its one tailed or two tailed in the manuscript
split the criteria across the two tails in a two tail case

• Only use one-tail hypothesis testing is if obtaining a result in the other direction is IMPOSSIBLE or UNINTERPRETABLE • Some might say one-tail is OK if you have a directional hypothesis, but, why might that be a bad idea?
if you happen to find a result in the opposite direction you can’t interpret it

  • Examples where one-tail is required or appropriate*
  • Implicitly, Chi-square and F-tests (only positive values; these results tell you that values differ, but not in which direction)
  • If you consider the consequences of missing a potentially large effect in the untested direction and conclude that they are negligible and in no way irresponsible or unethical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Statistical significance

A

“Significance” is with respect to your pre-determined alpha. Significance ≠ Importance. DON’T say results are “more/less significant” just like you wouldn’t say “Bill
passed the exam more than Stacy” < – For this sort of idea, we often use “effect size”. Importance = more of a theoretical idea. You can say the p value is different or the effect size is different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Type 1 error

A

Your research tells you to reject the null/alternative is true, but in reality the null is true.

Boy who cried wolf: FIRST error was that he cried that there was a wolf but in reality there was no wolf (people believed him)

can occur because of sampling issues, when participants in the study did not well represent the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Type 2 error

A

Your research tells you to accept the null and reject alternative but in reality the alternative hypothesis was true.

Boy who cried wold: the SECOND error was that people believed he was lying (accepted the null) and there was a wolf.

Might happen if the study lacks sufficient power; to increase power, recruit an appropriate number of participants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Power

A

When you research tells you that the alternative hypothesis is true and in reality the alternative hypothesis IS TRUE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Balance between error types

A

Where do you set your criterion?

  • Medical tests: A low criterion ensures you don’t miss abnormalities, but if too low, people may have unnecessary interventions
  • Law: If we wait until we’re sure beyond any possible doubt, then we won’t wrongfully convict anyone; but we might let a criminal go free.

This is why we’re obsessed with p < .05
• It makes us feel like there’s one true criterion
(see: Bayesian statistics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multiple Comparisons

A
  • If each test has a 5% (independent) chance of being a false positive…
  • Actual error rate across tests (technically family-wise error rate) = 1 - (1 - α) ^ (#comparisons)

• In the cartoon,
each test has a chance of being 5% false- do twenty test..
Error rate = 1-.95^(20) =1-.36=.64

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bonferroni Correction

A

• Divide the p-value criterion by the number of tests you perform: In the cartoon’s color tests (there were 20 tests of jelly beans): .05/20
need p < .0025

  • Often overly conservative
  • Other approaches differentially balance α and β
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Measures of Association

A
  • Typically Non-Experimental Designs
  • Correlation
  • Pearson
  • Spearman Rank-Ordered
  • Chi-Square
  • Regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pearson Correlation

A

• Relationship between two interval/ratio measures
• Ranges from -1 to +1
• Effect size: for each unit increase x, how much unit increase in y
• 0.25 ~ weak
• 0.50 ~ moderate
• 0.75 ~ strong
• Not everyone agrees on the exact numbers!
+1 or -1 are graphed as perfect lines at 45 degrees
+/- 1 is a perfect correlation
closer to +/- 1 = stronger relationship

• r is the strength of a relationship in your sample (APA)
• Does not say whether that relation is meaningful or not
-R^2 = Coefficient of Determination

*** r does not say whether that relation generalizes to the population: Statistical significance, Criterion r for a certain p-value, Degrees of freedom (df) = # pairs of observations – 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Coefficient of Determination (R^2)

A

-R^2 = Coefficient of Determination
• Square of r
• Proportion of variance accounted for

example-
• r = .50
• R^2 = .25

  • Weight only accounts for 25% of the variance in height 70
  • Other factors may account for much more
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Example Pearson’s Correlation and Coefficient of Determination

A

A study observed a correlation between children’s working memory and benefit from CI use
• Digit span test of working memory
• Word identification performance
• r = 0.41
• Indicates that 17% of the variability in performance was
accounted for by differences in working memory
• Demographics (age at test, age at implantation, duration of use) accounted for an additional 30% of the variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Degrees of Freedom

A

how many points are free to vary
• For correlations, df = n pairs - 2

• To preview group differences:

  • For one group, df = number of observations - 1
  • For two groups, df = n - 1 for each group –> n1 + n2 - 2

• df and αlpha determine the statistical test value (r, t, z, etc.) needed for significance.

  • If you have 4 participants, and their M = 10
  • What is the 1st person’s score? You don’t know– it’s free to vary.

• Assume the 1st score was 5. What’s the 2nd score?
You don’t know that, either – it’s also free to vary.
• Assume the 2nd score was 7. What is the 3rd score?
You don’t know that, either – it’s also free to vary.
• Assume the 3rd score was 15. What is the 4th score?
This one you do know. If the average is 10, the fourth score must be 13. So it is NOT free to vary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Restricted ranges

A

not as likely to have a strong correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Spearman Rank-Order Correlation

A

• Strength/direction of association between TWO ranked (ordinal) variables

• Non-parametric version of Pearson r
— Data not required to fit a normal
distribution; Less sensitive to outliers —

• ρ or r (sub) s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Chi-Square and Contingency Coefficient Tables

A

• Association between NOMINAL variables

  • Statistic (chi-square)
  • X^2
  • df = (rows – 1) * (columns – 1)

wont be tested on the formulas of these things
just understand what the test is trying to accomplish

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Contingency Coefficient

A
  • Magnitude of association (contingency coefficient)
  • C
  • Ranges from 0 - 1

wont be tested on the formulas of these things
just understand what the test is trying to accomplish

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Regression

A
  • Predictive value of association
  • Simple Regression = R^2
  • How much variance is accounted for?
  • “best fit” line: y = mx + b
  • DV = slope * IV + intercept
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Multiple Regression

A

• Strongest combination of IVs that predict a DV
- DV = slopeaIVa + slopebIVb + intercept
- Multiple R^2 indicates how much of the DV variance
that all of the IVs jointly account for

21
Q

Non-Linear Regression

A

• DV = slopeaIV2 + slopebIV + intercept
i could fit a straight line through this - but really this is not a straight line
cognitive effort: as things get really hard, effort starts to increase but then at some point you give up

22
Q

Writing Up Results

A

** For Pearson correlation ***
GIVE “r” value and the p-value (either exact or significance level)
• Some researchers also include: df, descriptive stats of each variable, R^2

Example-
“Working memory span and word identification were significantly positively correlated (r = 0.41, p < .05) –> Assuming you’ve clearly stated the sample size used in this test elsewhere.”

OR another example-
“There was a significant positive correlation between working memory span (M = 3.34, SD = .10) and word identification (M = 9.43, SD = 1.03), r(23) = 0.41, p = 0.42.”

  • The type of analysis conducted (i.e., two-tailed, Pearson correlation) and threshold for significance must be reported; for a short paper with simple results, possibly right before you give the results – For longer papers, an Analysis subsection of the Method may be preferable
  • Be consistent throughout your paper!
    - Number of digits after the decimal
    - Zero before decimal or not
    - Parentheses, spacing, etc.
23
Q

What’s the difference between a correlation and a regression?

A

Regression - gives predictive value of an association
in a regression, one variable is the independent variable to will predict the other dependent variable
only plot a regression line in a regression analysis

24
Q

Ways to write up inferential stats

A

The null hypothesis was rejected at the 0.05 level.

The difference between the means was statistically significant (P < 0.003).

The difference between groups was significant at the 0.01 level.

25
Q

Types of regression models

A

Regressions, t-tests, and ANOVAs
Ways to predict DV from IV
General Linear Models (GLM)

26
Q

Parametric vs. Non-Parametric

A

Parametrics: more powerful if all parametric assumptions are met

Non-parametrics: • Fewer assumptions
• Better for smaller samples, potential measurement error
• More powerful if parametric assumptions are not met

27
Q

Parametric assumptions

A
  1. Normality (sampling distribution of the mean)
  2. Homogeneity of Variance
  3. Independence of observations (unless repeated measures are specified)
28
Q

Assumption of Normality

A

• Imagine DV = working memory (WM) score
• Take a group of participants, record WM, compute sample mean
• Repeat this 1000s of times and plot the distribution of means; compute a sample mean thousands of times (each time get a lot of people’s data and average them )
not that your data is necessarily normal, but if you did
this a thousand times, the means would be essentially normal

• This distribution of MEANS must be normal (it’s the means we’re interested in for stats)

  • If N ≥ 30 per condition/group, you’re OK (but with ≥10, you may still be OK if non-normality is not too large)
  • If the underlying population is normal, you’re OK
  • But if not, what does your sample indicate…
29
Q

Histogram

A

• Frequency of scores across the distribution

30
Q

Normal Q-Q Plot

A
  • Circles = data points

* Sample data quantiles plotted against those of the normal distribution

31
Q

Homogeneity of Varience

A
  • Data from different groups/levels have the same variance (AKA homoscedasticity)
  • Many tests are robust to violations, so long as your group sizes are equal (That is, if: largest group size / smallest group size < ~1.5)
  • Common tests (p < .05 indicates of violation):
    • Independent Groups: Levene’s test
    • Repeated Measures: Mauchly’s Test for Sphericty
32
Q

What to do with violations of normality or homogeneity of variance?

A

• Nothing
• If your sample size is not small, linear regressions
(incl. t-test, ANOVA) are robust to moderate non-
normality
• Remove extreme values (if justified)
• Transform the data
• Make normally distributed
• e.g., Reaction times: log transform
• Stabilize the variance
• e.g., Percent correct: arcsine, rau, logit
• Transforms are common but can be controversial
• Run a statistical test with
• Appropriate corrections, e.g.,
• Welch correction for unequal variances in two-
sample t-tests (default in R!)
• Greenhouse-Geisser correction for unequal
variances in repeated measures ANOVA
• Different assumptions, e.g.,
• Non-parametric: doesn’t assume normality
• Logistic regression: optimized for binomial
(correct/incorrect) data

%correct = bounded ratio …. this is hard to use for prediction testing; evaluate possibilities of ceiling or floor effects (arcsine, rau) – rau- more math- -20 through +120
logit- transform

33
Q

Independent Observations

A
  • Observations between and within groups should be independent
    * One value is not affected/does not depend the other
  • A repeated measures analysis takes into account observations that are not independent, specifically that the same person produced two observations
34
Q

Group Comparisons (types of t-tests)

A

Three types of t-tests
• Single-sample: one group vs. baseline
• Paired-samples t-test: compares two matched/paired sets of observations (single-sample of paired differences)
• Independent-samples: compares two separate groups of observations (pooled SD)

35
Q

Single-sample t-test

A
  • Imagine an experiment where people press a button to indicate if they heard the word “dish” or “ditch”
  • Is a M = 56% significantly greater than the chance (50%)? What about M = 85% While 56% likely comes from the null distribution (i.e., chance), 85% likely comes from an alternative distribution.
36
Q

Two-Sample t-test

A

• Paired-samples: Do adults differ on a measure of audiotemporal processing in 25 different levels of background noise? Really a one-sample t-test of 15
difference score vs. 0.
Independent samples: Do younger and older adults differ on a measure of audiotemporal processing?

37
Q

Relationship between mean difference and effect size

A

increasing mean difference increases effect size - decreasing standard deviation increases effect size

38
Q

Error

A

when the means are close together, higher chance for error

39
Q

Effect Size

A

MAGNITUDE of the difference between variables; how far apart is one condition from another.

The standardized magnitude of the difference between groups/conditions or relation between variables (irrespective of “significance,” which is affected by sample size, alpha level)

  • Correlation: r
  • Regression: beta estimate
  • t-tests: Cohen’s d, Difference between Ms / pooled SD
  • ANOVA: eta-squared or partial-eta squared, η2
40
Q

Power

A

power: ability to say the alternative is true, when it is actually true

41
Q

t-test reporting in APA format

A

t(24) = 4.05, p < 0.05, d = .45

where 24 = degrees of freedom value

42
Q

Confidence Intervals

A

• Likely range of population effect size (or difference), based on sample data
-How confident are you that you would
you get the same result in a new study?
-Does CI include 0 or some other
benchmark?

• Confidence interval of the mean:
= M ± tcrit * SEM

• Example: M = 0.6, 95% confidence intervals: 0.2 - 1.0
• Population is 95% certain to be in the range of 0.2 to 1.0
• Doesn’t include zero, so we can be reasonably confident that a
true positive effect occurred

43
Q

Error Bars

A
  • Horizontal lines indicate minimum distance apart for p < .05
  • SEM = 63% confidence interval

when you plot confidence intervals, a quarter of the length of the bar can overlap and still be significant

SEM: you need at least half the length of the error bar apart to be significantly different

44
Q

ANOVA

A
• ANalysis Of Variance
• Parametric statistic
• Similar to t-test (signal-to-noise): Mean difference relative to variability
   - Specifically, differences in the amount of variability between
groups vs. within groups
   - Main effects (each of the IVs) and how these IVs interact with one another (if the effect of one IV is different at different levels of another IV)
• Factors: One-way, Two-way, Three-way
• Levels: Each factor can have 2+ levels
       • Between-subjects
       • Within-subjects (repeated measures)
       • Mixed Designs
• How many factors and levels in:
       • 2 x 2 x 3 ANOVA
      3 factors (# of #s)
      2 levels in the first
      2 levels in the second
      3 levels in the third

F test - doesn’t say which direction the difference is in

• A significant effect only tells you there is a
difference unlikely due to chance
• Does not mean ALL differences are significant (e.g., for 3 SNRs, does recognition differ between each?)
• Does not tell you the direction of an effect (it’s one-tailed!)
• To understand the results:
• Plot to show the pattern
• Test to show which levels differ significant
• Newman-Keuls, Sheffe, Tukey & t-tests

45
Q

Example Multifactorial Study

A

• Do children speak differently to an infant than to an adult?

  • Factor 1: who the child was speaking to (listener)
  • Factor 2: whether the child had siblings

• Main effect of listener
- Ignores factor 2: just looks at whether kids in general spoke differently to infants than to adults

• Main effect of siblings
- Ignores factor 1: just looks at whether kids with siblings talk
differently overall than kids without • Interaction of the two factors

46
Q

Interactions

A

• The effect of one factor depends on another; effects of one factor are not consistent at all levels of the second
• Example: Do kids speak differently to infants than to adults?
• You find that having siblings matter (main effect)
• kids with younger siblings speak differently to adults than to infants, but kids without younger siblings do not
• But, the effect of listener changes depending on the value of the other factor (having younger siblings or not)
lines cross on graph make an x = interaction no main effect; one horizontal line and one diagonal line = 1 main effect and interaction

47
Q

Writing Up ANOVAs

A

There was a significant effect of SNR on speech recognition, F(dfIV, dferror) = fval, p < .05, η2 = xx. As shown in Figure 1, this effect was driven by a significant difference between the +3 and 0 db SNR conditions, t(df) = tval, p < .05, d = xx. There was no significant difference between the other levels of SNR, all p > .10.

48
Q

Non-Parametric Statitics

A

• Mann-Whitney U: NP version of independent- samples t-test
• Sign Test and Wilcoxon Matched-Pairs Signed- Ranks test: NP versions of paired-samples t-test • Sign test is trivially easy to compute, but only gives you
the direction (not magnitude of the difference)
• Friedman’s two-way ANOVA: NP version of a one- way repeated-measures ANOVA
• Kruskall-Wallis test: NP version of one-way independent samples ANOVA

49
Q

three factors for determining if a t-test will come out to be significant

A
  1. the magnitude of difference between the means
  2. the amount of variability in the data (less variability= greater likelihood for significant differences)
  3. sample size (larger sample = greater chance of significance