PSYC 301- final exam Flashcards
data fishiness- definition
properties of data or statistical tests that suggest potential problems (Abelson calls it this)
two approaches to evaluating the assumptions of normality
NHST and descriptive approaches
repeated measures (within subjects) one way ANOVA tests
mean differences in repeated measure studies with 3+ levels of a single factor
what does
T
K
G
n
N
P
mean in within subjects anove
T- sum of scores within a condition
K- # of levels of the IV
G- sum of all scores
n- sample size
N- total # of scores for the sample (kxn=N)
P- sum total of scores for given person in sample
variability in the means of scores across conditions exists for two reasons in within subjects ANOVA
(variance between treatments)
treatment effect- the manipulation distinguishing between conditions
experimental error- random chance errors that occur when measuring the construct of interest
*note no individual differences bc this is a constant across conditions; individual is baseline to themselves
variability in the means of scores within conditions could be a result of 2 sources
(variance within treatments)
individual differences- differences in backgrounds, abilities, circumstances etc of individual ppl (this can be calculated out though)
experimental error- chance errors that occur when measuring the construct of interest
SSerror =
(4.22)
SSerror = SSwithin treatments - SS between subjects (individual diffs)
conceptually the F test for repeated measures becomes (4.16)
treatment effect + experimental error/ experimental error
4.17 ** repeated measures in a nutshell
F = MSbetween treatments/MSerror
computing within treatment variability (4.19)
SSwithin treatments = ∑SSwithin each treatment
SStotal =
SStotal = SSwithin + SSbetween
total df for repeated measures ANOVA (4.23)
dftotal = N-1
df between treatments
(4.24)
df between treatments = k -1
df within treatments
(4.25)
df within treatments = N-k
formulas for specific MS values in ANOVA
4.28
4.29
MSbetween treatments = SS between treatments/df between treatments
MSerror = SSerror/dferror
assumptions of the repeated measures ANOVA
- independence of sets of observations
- distribution of the outcome variable should be normally distributed in each level of the IV
- sphercity (type of homogeneity of variance; equality of variances in different scores across all levels of the IV)
- homogeneity of covariance
what is sphercity
Are the differences in performance between Program A and Program B, Program B and Program C, and Program A and Program C equally variable?
equality of variances in different scores across all levels of the IV
data fishiness assumptions
assumption of normality
assumption of homogeneity of variance
independence of observations
assumption of normality
scores the DV within each group are assumed to be sampled from a normal distribution
evaluating the assumption of normality
NHST approach
- tests if sample dist is sig. different from normal dist
- skew- captures symmetry
- kurtosis- captures extreme scores in tails ( 0= normal)
Descriptive approach
- look at descriptive/ graphical displays to quantify the magnitude and nature of non- normality
- skew and kurtosis threshold values ( skew greater than 2, and kurtosis greater than 7), positive kurtosis tends to be worse
- graphical displays (normal qq plots) plot your dist against normal dist with same sample size, if data is normal it looks like straight line, tails, thin or fat
pros and cons of NHST and descriptive approachin evaluating normality
NHST bad bc of the role of sample size
- insensitive to non-normality in small samples and too sensitive to non-normality in large samples
- doesn’t take into account the type of non normality and how much, the question itself doesn’t make conceptual sense bc we want to know if the size (magnitude) of the non normality will alter our data
Descriptive approach better than NHST bc it allows us to see magnitude and type of non normality, but there is still the element of subjectivity meaning that it’s easy to see results when clearly good or bad, but its difficult to judge if deviations are consequential in ambiguous cases
assumptions of homogeneity of variance
assumption that variances around the means are generally the same
variances in scores on the DV within each group are the same across groups
evaluating the assumption of homogeneity of variance
NHST approach
- tests if variances in groups are sig diff from each other; levens test, hartleys variance ratio and f-max test
descriptive approach
- looks at descriptobe stats/ graphical displays to quantify the magnitude of differential variances
- threshold ratio of largest to smallest variances (recommended threshold 3:1)
- graphical displays (qq plots) take data from 2 conditions and plot (lowest and lowest together etc), if condiions satisfied it’ll be a straight line with slope of 1 and intercept equal to the difference between the means
assumption of independence of observations
each observation (between subjects) or each set of observations (within subjects) comprising the data set is independent of all other observations or sets of observations in the data set
basically, no inherent structure in the nature of our data; no cluster
excluding couples data or roomates data
positive associations inflate alpha
negative associations inflate beta
evaluating the assumption of independence of observations
examine structural properties of data to see if a basis exists for questioning the validity of the assumption
if no basis is evident, generally fine to conclude the assumption holds
if a basis exists, independence can be assessed by computing the intraclass correlation for the structural property in the data presumed to produce the violation of independence
if intraclass correlation is very small (less than .10), prob fine to use t tests or ANOVA
clear thresholds for intraclass correlations remain debated so the conceptual basis for expecting violations is important in evaluating this index
if violation occurs, best to use alt analysis that accounts for lack of independence
addressing violations of assumptions
normality
- use alr procedures
- transform data to normalize dist
- identify and remove outliers (80%of time this is problem)
- eval level of measurement assumptions
homogeneity of variance
- use alt procedures
- identify and remove outliers
- eval level of measurement assumptions
indep of obvs
- alt stat procedures like MLM and HLM
outliers
extreme values in a data set that differ substantially from other observations in the data set suggesting they might be drawn from a different population
often responsible for violations of normality and homogeneity of variance
have a disproportionate influence on stat results
examples of common outliers
data entry/coding errors
responses in latency data
open ended estimate data (no upper boundary)
identifying outliers
impossible values in freq tables or histograms
seen in normal qq plots as steep tails
standardized residuals (general thresholds of 4 or 5 are sufficiently weird), includes target observation in mean which can drag it
studentized deleted residuals: index of deviation from the mean NOT including the target observation in the calc of the mean
- sample of 100, 3.6
- sample of 1000, 4.07
thin tails
fewer extreme observations than the normal dist
fat tails
more extreme observations than the normal dist
responses to outliers
impossible values should be corrected if possible or treated as missing data if not possible to correct
trimming or capping to most extreme acceptable value in data set/specified value
- conceptual basis not ideal bc no reason to assume value
philosophocal issues in outliers
minimalist- data should be minimally altered
- dists should have some extreme values
maximalist- routine altering or delation of values
- outliers create violations
intermediate- won’t throw out unless really problematic
levels of measurement
Nominal
- categorical distinctions, no mag
Ordinal
- rank ordering, no mag
Interval
- rank ordering and mag
Ratio
- rank ordering, mag, and ratio of difference
for rating scales 7 points is sufficient, 5 is ambiguous, and less than 5 is problematic
argument for levels
has been argued that t tests and ANOVA are only meaningful to conduct if DV has at least interval properties
very problematic distributional properties of data can sometimes indicate level of measurement is not appropriate
factorial ANOVA and its advantages
general term for an ANOVA with more than 1 IV
modest gain in efficiency
ability to test joint effects
- additive- no interaction
- nonadditive- interaction
in a 2 way anova we test how many effects
3 effects
main effect of IV1
main effect of IV2
interaction effect of IV1 with IV2
number of levels doesn’t change number of effects!
interactions sometimes referred to as
moderator effects
- a moderator regulates the effect of another IV
if 1st IVs effects change based on the 2nd IV, 2nd IV is the moderator
F test associated with null hypotheses for 2 way ANOVA and the hypothesis (4.1)
F = variance between treatments/variance within treatments
difference is that between treatment variance will now be further divided into 3 components:
Factor A between treatment variance (MSa)
Factor B between treatment variance (MSb)
Factor AxB between treatment variance (MSaxb)
F val of 1 indicates no treatment effect (0)
F value greater than 1 indicates given treatment effect exists
2 way between subjects ANOVA
G
N
p
q
n
G- grand total of all scores in entire experiment
N- total number of scores in entire experiment
p- # of levels in factor A
q- # of levels in factor B
n- # of scores in each treatment condition (each cell of the AxB matric)
computer A x B between treatment variability (6.6)
SSaxb = SS between - SSa - SSb
general formula for mean square (4.10)
MS = SS/df
articulation (abelson)
extent to which results are presented in a clear and useful manner; as results get more complex, there will be more ways they can be articulated
as in 1 way anova, 2 general approaches to follow up tests exist for two way anova
a posteriori tests (post hoc)
a priori tests (planned)
analysis of simple effects
effect of 1 IV at a specific level of the other IV
once we get to 3 levels, simple effect test becomes omnibus itself, need contrasts
setting alpha in 2 way between sibjects anova
alpha almost never adjusted for these multiple tests in ANOVA, thus emphasis tends to be on confirmatory analyses
replication seen as more essential
principle for setting beta in the context of multiple test
minimum acceptable power on the basis of the weakest anticipated effect
minimum acceptable power on the basis of the most important effect/ sets of effects
calcuating standardized effect sizes in 2 way between subjects anova
np2 = SSeffect/ SSeffect + SS within
pearson correlation coefficient
index of association that assesses the magnitude and direction of linear relation between 2 variables
r = covariability/ variability separately (7.2)
sum of the products of devation (7.3)
SP = ∑(X - X̄)(Y- Ȳ)
taking deviation products and summing them
index of covariability
- lots of above/above and below/below pairs will produce big positive SP values
-lots of below/above and above/ below pairs will produce big negative SP values
- equal mix of both will produce near 0 SP values
r coefficient is an index of covariability of X and Y relative to variability of those separately
formula (7.4)
r = SP/ square root of SSxSSy
relationship of r to z scores
z scores reflect an individual’s scores standing within the distribution for that score
tells us where they fall in the distribution of everyone
so r can be expressed in terms of z scores
r expressed in terms of z scores (7.5)
r = ∑ZxZy/n
seen by some as best formula for r
coefficient of determination
if pearson correlation coefficient (r) is squared, it reflects the proportion of variance in one variable linearly accounted for by the other variable
ex. r=50 indicates that the first variable accounts for .25 (25%) of the variability in the second score
- 25% overlap
formula for t test of r (7.6)
t = r√n-2/√1-r2
bigger rs make bigger ts
bigger correlation gets in denominator, smaller the number gets
factors influencing the size of r
distributions of variables
- perfect correlations only possible if shape of dists is exactly same
reliability of measures
- perfect correlations only possible with perfect reliability in both measures
restrictions of range
- restricting the range on either variables can attenuate correlations
regression
formal procedure by which scores on one variable can be used to predict scores on another variable; it’s the statistical procedure by which we use a data set to arrive at a formula to produce the best fitting line for that data set
ex. GRE on yale undergrad grades
the better our predictor, the more tighly data points will cluster around the line
when two variables are linearly associated, can be described with basic equation (7.7)
Y = bX + a
X- scores on first variable (predictor)
Y- scores on second variable (outcome)
b- fixed constant reps the slope of the best fitting line
a- fixed constant reps the Y intercept (expected value of Y when X is 0
the extent to which the line generated by a given regression equation fits a specific data set is defined by the following (7.8)
foundational to regression
total squared error (SSerror)
Total squared error = ∑ (Y - Yhat)2
y reps an actual data point and y hat reps the predicted value for that data point given its X value
small values reflect less error
formula for b (7.9)
b = SP/SSx
Sp is measure of covariability
SSx is measure of total variability of X
higher Sp increases b
higher SSx decreases b
formula for a (7.10)
a = Ybar -bXbar
when x and y are z scores, the simple regression equation becomes (7.11)
Zyhat = rZx
r becomes our slope and a becomes 0 so it can be dropped
explain how b becomes r and a becomes 0
r= SP/square root of SSxSSy and b = SP/SSx so they’re the same
and then
both x and y have means of 0 when they’re z scores so
a = Yhat - bX
a = 0 - b0
a = 0
standard error of estimate
a measure of the standard distance between a regression line and actual data points
total squared error is in it
look at ipad
SSerror is related to r, as r approaches 1, SS error becomes smaller and as r approachs 0 SSerror becomes larger
SSerror equation (7.13)
SSerror = (1-r2)SSy
this leads to an alternative formula for standard error of estimate
standard error of estime alt formula (with r)
look at ipad
F test for the regression coefficient
F = variance predicted by the regression/ error variance
MS values of regression
look at ipad x 2
F test (7.20)
F = MS regression/ MSerror
regression assumptions
independence of observations
linear relationship between X and Y
residuals (errors in prediction) normally dsitributed with mean of 0
homoscedasticity of residuals- equal variance around regression line
MAGIC
M- magnitude
the mag of an effect can play a role in the persuasive strength of a research claim
- big effects not always practical
- small effects sometimes impressive
- conceptual implications sometimes matter more than size of effect
A- articulation
persuasive strength of a claim will be influenced by how efficiently, accurately and clearly an analytical strategy is used to capture key conclusions from the data
G- generality
generality across studies and researchers (replication)
generality across pops and contexts
I- interestingness
interesting as function of method
interesting as function of theory
interesting as function of surprise (novelty/mag)
interesting as function of importance (prac, implications)
C- credibility
conceptual basis for credibility
- fits with existing theory
- fits with common sense
methodological basis for credibility
- data fishiness
- improper stat procedures
- alt explanations beyond IV
- IV and DV reflect their constructs?
main effect, what is it
effect of 1 ivs overall on the dv
interaction
differences of differences; compares the differences in one factor across levels of another to determine whether they are consistent or not