PSYC 301 Flashcards
simplest explanation for difference is
chance
independent samples t test equation
t=x1-x2/SE
why? assuming pop = 0
what’s the idea of SE
accuracy or precision of our estimates
when it’s small, our estimates are probably pretty good
as sample size increases, SE decreases
Abelson’s MAGIC criteria
M- Magnitude
A- Articulation
G- Generality
I- Interestingness
C- Credibility
what is s (standard deviation)
dispersion of scores around the means
a bigger S (SD) means
means a bigger spread; worse estimate
focus of NHST
is on a qualitative decision: does a systematic difference exist or is the different merely a function of chance?
NHST is a method for
deciding a difference likely exists, but does not speak to the size of that difference
bayesian statistics argues that
just bc the null is unlikely for our data, does not necessarily mean the data are likely to be drawn from a population where our systematic difference is true (alt is true)
in bayesian statistics, we calculate a?
Bayes factor
(ratio of the liklelihood of the alt hypothesis relative to the liklihood of the null hypothesis
what bayes factor is considered moderate and strong evidence for alternative hypothesis more likely than null
3 moderate (threshold for starting claims)
10 strong
bayesian approach offers
an alternative method for assessing the viability of our random chance explanation vs a systematic explanation
bayesian statistics has same problem as NHST which is
“does increase in confidence in the alternative relative to the null really translate into magnitude of the effect and how do I interpret that”
- doesn’t tell you whether it’s big or small which is the same issue with NHST
raw effect sizes
- not used much in psych
- look at the size of the difference between the two means and treat that as an index of magnitude
- used in econ with money which is a meaningful benchmark
raw effect sizes are helpful when
when the outcome variable of interest (DV) is on a metric that is meaningful and readily interpretable in light of some clear criteria
raw effect sizes are problematic when
- the outcome variable is not easily interpretable with respect to specifiable criteria
- one needs to compare effects with outcome variables that are on different metrics
standardized effect sizes indices names
cohen’s d and ___
independent samples t test cohens d formula
ds = x1-x2/pooled s
pooled S equation
Pooled S=
√(n1-1) S2 + (n1-1) S22/(n1+n2-2)
ds ____ as the mean difference ____ and the standard deviations ____
increases increases decrease
ds is not influenced by
sample size
cohens d is sensitive to 2 properties of the data
- differences of means (2 rly far apart bigger than closer together)
- standard deviation (as SD gets rly small, effect sizes get bigger)
cohens d is an index for
for how distinct 2 groups are from each other
ds has a minimum value of __ and an upper boundary of ___
min value of 0 and no upper boundary
ds can be interpreted as the % of the SD
0.5- difference between the means is half the size of the dependent variable’s SD
1.00- indicates the difference is as big as the SD of the dependent variable
2.00- indicates a mean difference twice the size of the standard deviation of the DV
ds guidelines (cohen’s d guidelines)
0.2 small
0.5 medium
0.8 large
dav equation
dav = D/ avg. S
what does dav ignore and drm takes into account
ignores the magnitude of correlation between sets of observations
drm equation
look @ ipad
___ will tend to be more similar to ___ than ___ except when r is low and the difference between SD are large
dav will tend to be more similar to ds than drm except when r is low and differences between SDs are large
___ is more conservative than ___ but is considered overly conservative when r is large
drm dav
pearson r coefficient r is what
r is the strength of association between variables
r can be calculated to express what
r can be calculated to express the strength and direction of association between two continuous variables and also the relationship between a dichotomous variable (ex. membership in one of two groups) and a continuous variable (ex. a dependent variable)
biserial correlation
r express the relationship between a dichotomous variable (ex. membership in one of two groups) and a continuous variable (ex. a dependent variable)
in this context r can be conceptualized as the strength of association between membership in one of the two groups and scores on the dependent variable or when squared it expresses the proportion of variance in the DV accounted for by group membership
interpreting r as an effect size index
r ranges from -1.00 to 1.00 with .00 indicating no association
cohen’s guidelines for r
.10 (small)
.30 (medium)
.50 (large)
if r gets bigger, our cohen’s d gets ___
smaller
will adjust down the more correlated the two scores are
large effect sizes do not directly imply practical significance, why?
- metric can be hard to interpret without reference to more concrete reference criteria
- durability of an effect might also be relevant in addition to its size
- cost/benefit analysis also can determine practicality
when are small effects impressive
when there are minimal manipulations of the IV
when it’s difficult to influence the DV
conceptual consequences of an effect also critical to evaluating importance
existence of an effect differentiates between competing theories
existence of an effect challenges reigning theory
existence of an effect demonstrates a new or disputed phenomenon
when computing confidence intervals, we typically specify
95% CIs
The confidence interval is a
range of values where you expect the true difference between the population averages to fall.
the width of confidence intervals will be determined by what
standard errors which are influenced by sample size and variability around the means
if a confidence interval around an effect size contains 0,
that indicates the effect is not statistically significant
a really big t value says what
that it came from a null pop
how unlikely does t-value need to be before concluding the chance explanation is no longer tenable
alpha (0.05)
type 1 error
concluding a mean diff exists in the pops (rejecting the null) when there isn’t actually a difference
type 2 error
concluding there is no mean difference between pops (failing to reject the null) when there is actually a diff in means betweeb=n the pops
20% chance
power
likelihood of finding an effect when it’s really there; flip of B
conventionally 80%
determinants of power
- alpha level
- stricter the a, lower the power (more likely to make a type 2 error, under control of researchers - sample size
as n increases, precision increases which shrinks SE
- the larger the n the greater the power
- under control of the researchers - magnitude of effect
- the larger the effect of IV, the greater the power
- somewhat under the control of researchers
when is a between-subjects one way ANOVA used
when there are more than 2 levels and thus require comparing means from 3 or more independent samples
single factor experiment
an experiment with one IV
one way anova
an anova with a single factor
two factor experiment
an experiment with two IVS
two way anova
an anova with two factors
what does a one way anova test
tests if at least 1 mean difference exists among the levels
null and alt hypothesis in ANOVA
null says all means = 0
alr says at least one mean is different from the others
anova used to test mean diffs but its calculations are based on
variances
all abt testing mean diffs but really its just tests for variance
total variability in scores can be divided into:
between treatments variability (captures mean diffs) and within treatments variability (variability within scores)
between treatments variability: if one were to compare a single score drawn from each of two conditions, these 2 scores could be different for 3 reasons
- treatment effect: the maniputation distinguishing between conditions could influence scores
- individual differences: differences in backgrounds, abilities, attributes, and circumstances of individual people
- experimental error: chance errors that occur when measuring the construct of interest (ex. lack of attention)
- researchers try to minimize this in their studies
within treatment variability: if one were to compare two scores drawn from the same condition, these scores could be different for 2 reasons
- individual differences
- experimental error
note* no treatment effect listed bc this is a constant within the condition
what test statistic is associated with an ANOVA and what’s its formula
F-ratio statistic (F test)
F= variance between treatments/variance within treatments
in other words:
F= treatment effect+individual diffs+experimental error/individual diffs+experimental error
when the null is true for a between subjects one way ANOVA:
0+individual diffs+experimental error/individual diffs+experimental error
results in a value nearly equal to 1
when the null is false for a between subjects one way ANOVA:
F=treatment effect+individual diffs+experimental error/individual diffs + experimental error
results in a value larger than 1
denominator of the f test
measures uncontrolled and unexplained (unsystematic) variability in the scores
called the error term
numerator of the f test
measures same error variability, but also variability from systematic influences (treatment effect)
other way to describe f test
systematic variability/error term
k, n, N, T, G meaning in ANOVA
k- number of levels (conditions) in the factor
n- sample size for specific condition
N- sample size for whole study
T- sum of scores within a specific condition
G- sum of all scores in the experiment (all Ts)
SS ANOVA
sums of squares (the sum of the squared deviations of each individual score from the mean)
an index of variability
ANOVA involves two parts
analysis of sums of sqaures
analysis of dfs
SS between formula
look at ipad
what is SS means
deviation of group/condition means around a grand mean
represents how much spread there is
if conditions deviate lots from grand mean, they’re really diff from eachother
in ANOVA the term for variance is
mean square
sample variance equation
S2 = ss/n-1 + ss/df
general formula for mean square (MS) is
MS = SS/df
gives us variance
f ratio formula
F = MS between/MS within
if we get an F value for which there is only a 5% or less chance of obtaining a value that large or larger, we no long consider what
no longer consider the null explanation tenable and conclude that at least 1 difference exists among the means
the one way anova is an ____ test that _______
the one way anova is an omnibus test that evaluates a very global and diffuse question
means that it tells us at least 1 difference exists but not the precise number of differences or where they occur which presents a challenge for the Articulation in abelson’s magic criteria as results get more complx, there are more ways in which they can be articulated
to general approaches to follow up tests for anovas
post hoc: follow up tests that are not based on prior planning or clear hypothesis
a priori tests (planned tests): planned or theoretically driven follow up tests
when is a post hoc test considered appropriate and what do they assume
when the omnibus F test is significant
assume no clear conceptual basis for comparisons and thus explores all possible pairwise comparisons
what do post hoc tests attempt to control for
attempt to control for familywise error (type I error rate across tests conducted on the same data
- once you do 5 of these, error rate is about 23% chance r
relationship between familywise error and power
stricter control of family wise error comes at the sacrifice of less power
common post hoc tests
LSD
- doesn’t control for family wise
Bonferroni adjustment
- keeps family wise down with few number of coparisons
- takes traditional a and divide by # of comparisons
Tukey HSD
- tests all pairwise with strong conrol of familywise error
- unequal sample sizes and differences in variances are a problem
a priori tests used when
when there are expectations about specific differences or there are specific comparisons that are particularly important to the research question
planned contrasts allow us to test more specific patterns or comparisons within our omnibus f test
the precise comparisons that are conducted in an a priori test are specified by
contrast weights
what does an anova with a significant f test entail
tells that at least 1 difference exists among our 4 means
if contrast weights sum to 0 then
its orthogonal;
is contrast weights = anything other than 0 then
its nonorthogonal
orthogonal meaning
slices of variance completely independent of each other
non orthogonal
when slices of variance overlap (results of contrast are not independent of one another
nothing wrong with using non-orthogonal contrasts so long as you recognize the lack of independence
most commonly reported effect size in anova
eta-squared
eta squared equation
n,2 = ss between/ ss total
ranges from 0-1.00
cohen’s f
another effect size used for anova
f= square root of ss effect/ss error
assumptions of between-subjects one way anova
- independence of observations
- the distribution of the outcome variable should be normally distributed in each group
- homogeneity (equality) of variance in the outcome variable across the groups