Weeks 5-6 Flashcards
Statistics…
- allow us to evaluate the….
- In experiments, most of the statistics is comparing….
- Statistical analysis used depends on the…
• Statistics allow us to evaluate the evidence to
determine if our IV had any effect.
• In experiments, most of the statistics is comparing
groups (2+) on one or two/three independent
variables. Mean group differences.
• Statistical analysis used depends on design (W/B)
and the nature of the variables (DV: cat or con;
levels: 2+)
How do we test if 28ms is enough evidence to conclude we have sufficient evidence to reject the null hypothesis? what does the __ ___ tell us?
Use sampling distribution, the probability of findings the same result or bigger given the null hypothesis is true (i.e., the probability of results being due to sampling error and not the manipulation of the IV).
Null hypothesis means…. it is believed to be…
- is the likelihood of finding the same results, without
IV manipulation, given the null hypothesis is true is
less than 5% (p-value less than .05) we reject the null
hypothesis. - Null hypothesis is believed to be true till proven
guilty.
the test statistic is ___ divided by ___. We compare it to the ___ to test….
• group difference divided by standard error
• = ratio (variability between groups explained by
IV/natural variability in sample-sampling error-
difference from population to mean-confidence it’s
close to true mean)
• compare it to the sampling distribution to see how
often would I get a t of this big or bigger if the null
hypothesis is true?
what is the sampling distribution? It samples from a ___ population. Its shape is determined by __. A larger N is more likely to give us…. Its tail is where…. A t-score closer to zero will fall where in the distribution?
• The sampling distribution tells us how likely it is to
get the same t-score if the null hypothesis is true (or
bigger when sampling randomly from the same
sample).
• Sampling distribution shows me how often I will get
different values of a test statistic (e.g., t) by randomly
sampling from a SINGLE population.
• Shape of the sampling distribution depends on how
big my sample is (for an independent t-test, degrees
of freedom = N – 2).
• The larger the N the more likely given the null
hypothesis is true should the differences between
groups should be smaller i.e., closer to 0.
• In the tail of the sampling distribution, the rejection
region where it is not likely that the test statistic was
produced by the null hypothesis being true
(sampling error and not due to IV)
• T-score that is closer to 0 means it’s more likely to
be caused by null and not IV, Bigger Test scores are
better! In tails.
what do we look at first when looking at independent and dependent t-test outputs?
*not the p-value first!
1. descriptive statistics:
mean differences between groups (most important
information). SE in this case refers to the mean[s] 92.4,
I’m happy to accept the variability of 90 miliseconds
either side from the mean, not confident in my mean
being true of the group.
2. Assumptions:
are the assumptions of my t-test met? do I need to do
a non-parametric test instead?
3. Now we look at the test-statistics, p-value and cohens
d.
if the t-test statistic is 0.218 and the p-value is .83 what does this tell us?
83% of the time the null hypothesis will produce a test-statistic of the value 0.218 or greater. this is why we set the p-value significance cut-off at 5%.
- Reject null: if the likelihood of the null hypothesis
being true is less than ….. - Bidirectional ….
- Directional ….
- Reject null: if the likelihood of the null hypothesis
being true is less than 5% (produce test statistic this
size or greater less than 5% of the time). - Bidirectional (5% reject, two tail, 2.5% each side;
need bigger t to fit in small tail or more evidence to
reject) - Directional (5% reject, one tail 5% on one side,
smaller t is needed, less evidence needed, divide p-
value by two to account for bigger rejection region)
P-value vs Cohen’s D: what do they each tell us?
§ P-value:
o How confident I am that the null hypothesis did not
produce this data (less than 0.05, less than 5%
chance that the t-test score was due to the null
hypothesis (sampling error) rather than the IV). P-
values do not tell me how big the effect is! Smaller p-
value doesn’t mean bigger effect, the P-value tells us
about our confidence that the effect is due to IV (i.e.,
is effect statistically significant not how big is the
effect).
§ Cohen’s D:
o Effect size is determined by Cohens d = difference
between means/SD (pooled) not SE so not corrected
for n
o How big is the effect of the IV on the DV?
o - sign is arbitrary in Jamovi, so ignore it.
o Directional hypothesis= one tail hypothesis, divide
.83/2 = .415
o P-value .415 (one tailed; significant)
o P-value .83 (two tailed; non-significant)
o 0.098 is smaller than small (small .2, medium .5 and
large .8).
three steps of reporting a t-test:
- Introduce your test (IV, DV, and t-test should be
mentioned in first sentence)Mean response times to angry and happy faces
were compared with an independent t-test - Report your test statistics
t, df, p, d, (to 3 dp).If t and d are negative, drop the negative sign (it’s
arbitrary)It is assumed your t is two-tailed. If you are doing a
one-tailed test, divide by 2, and report that it is one-
tailed (note – there is another way to do this in
JAMOVI which we’ll learn Friday)P-values don’t get a leading 0 (they can never be
more than one). t and d get leading zerosIndicate whether you reject or fail to reject the null
hypothesis.Results failed to reject the null hypothesis, t(18) =
0.218, p = .415 (one-tailed), d = 0.098. - Describe the effect in English, providing descriptive
statistics (M, SD).There was no significant difference in RT between
participants who searched for an angry face (M =
730 ms; SD = 292 ms) and those who searched for a
happy face (M = 738 ms; SD = 282 ms).
Within-Subjects Design each row is _ ___, it has __ SE, mean difference, and SD’s.
the test statistic for a dependent-t-test is calculated by….
it will produce a bigger ___ and ___ relative to between-subjects designs?
• Each row is a person
• Same variability in sample, but smaller SE, SD and
Mean difference is smaller (IV effect is weaker now
the random noise in the data is removed).
• SE is 3
• SD 10ms (is small)
• Mean Difference (12ms is small)
• The difference between conditions (not response
overall but the calculated mean difference of 12ms)
divided by SE of the mean of the differences.
• Smaller Error because of within subject’s design
means the test statistic will be bigger and is more
likely to be statistically significant (not big effect
though).
t-test statistic and cohen’s d.
A paired t-test is also called what (3) things and is used when….
Also called
- Dependent t-test
- Matched t-test
- Repeated measures t-test
when two within-subjects groups.
the larger the effect size (cohen’s d) the __ overlapped the distributions of the two groups are.
.2 =
.5 =
.8 =
2 =
less.
.2 = 83% overlap with 58% of control group falling below
the experimental groups mean.
.5 = 67% overlap with 69% of control group falling below
the experimental groups mean.
.8 = 53% overlap with 79% of control group falling below
the experimental groups mean.
2 = 19% overlap.
Misconception about p-values
it tells me ___ but doesn’t tell me what (4) things…
A p-value is…
The likelihood/probability of getting a statistical result
(t, F, r, etc) that big (or bigger) IF THE NULL HYPOTHESIS IS TRUE.
Misconception about p-values
1. The p-value tells me how big (or important) my effect
is.
2. If I reject the null hypothesis, my research hypothesis
must be true.
3. If I fail to reject the null hypothesis, the null
hypothesis must be true.
4. If I find a significant effect, I must have conducted my
experiment well (e.g., the experiment “worked”).
The replication crisis is now called the __ ___ and occurs…
credibility revolution; occurs in all sciences.