analysing experimental studies Flashcards
how do we know if an interaction is significant?
look at the p-value of the interaction term - if it is significant, the p-value should be smaller than the alpha level
how do we know if one variable is significant?
F-stats tell us the overall significance of a model so if we only include on variable in the model it tells us the significance of the variance in our outcome based on the one variable included.
incremental F-test
a regular f-test is an incremental test of our model against a null/’empty’ model (just the intercept where all β = 0)
the incremental f-test evaluates the statistical significance of the improvement of variance explained in an outcome with the addition of further predictors - based on the difference in f-values between the two models
- the model with more predictors is called model 1 or the ‘full model’
- the model with less is called model 0 or ‘restricted
model’
incremental F-test equation
F (dfR - dfF), dfF = [ (SSRr - SSRf) / (dfR - dfF) ] / SSRf - dfF
where:
dfR = df of restricted model
dfF = df of full model
SSRr = residual sums of squares of restricted model
SSRf = residual sums of squares of full model
anova() function in R
used to perform an incremental f-test
provides the following results:
- residual df for both model
- SSresidual for both models
- difference in their dfs
- difference in their SS
- incremental F-test
- p-value for the significance of the test (needs to be smaller than our alpha value to be significant)
Nested vs non-nested models
nested = the predictors in one model are a subset of the predictors in the other (models must also be computed on the same data)
- can use incremental F-test
non-nested = there are unique variables in both of the models so there is no way of making them equivalent
- have to use AIC or BIC (smaller/more negative values indicate better fitting models)
AIC and BIC
both contain a primary correction - meaning they penalise models for being too complex. BIC is harsher in this aspect
AIC = n * ln( SSresidual / n ) + 2k
BIC = n * ln( SSresidual / n) + ( k* ln(n) )
- where ln = natural log function
AIC and BIC only make sense when used for model comparisons.
For BIC a difference of 10 can be used as a rule of thumb to suggest that one model is better than another (AIC does not do this) - we want the model with the smaller AIC or BIC value
why do we need constraints/a reference group?
we want a model that represents our data/observations but all we ‘know’ is what group an observation belongs to ( µ = β0 + βi) this creates problems as we don’t want to estimate too many variables (β0 means we will have one more parameter to estimate than group means)
- constraints fix this by making one of the group means β0
what is effects coding?
also called sum to zero coding
we compare groups to the grand mean (the mean of all observations)
we reduce the number of groups we have to
βj = 0 = all β values must sum to 0
typically used in experimental settings when there isn’t always an obvious reference group
how does the dummy coding constraint help?
example = 3 treatment variables
before dummy coding:
µA = β0 + βA
µB = β0 + βB
µC = β0 + βC
this is 4 βs for 3 group means
how dummy coding fixes this
µA = β0
µB = β0 + βB
µC = β0 + βC
how does the effects coding constraint help?
example = 3 treatment variables
before effects coding:
µA = µ + βA
µB = µ + βB
µC = µ + βC
where µ = grand mean
- this is still 4 things to estimate for 3 group means
how effects coding fixes this (sum to 0)
µA = β0
µB = β0 + βB
µC = β0 - (βA + βB)
effects coding results - general interpretation
β0 = µ = grand mean = reference group
- mean of predictors / k
- e.g. (µA + µB + µC) / 3 = µ
βj = difference between the coded group and the grand mean
- βj = µj - µ
steps in effects coding:
- create k-1 variables
- for all observations in the focal group, assign 1
- for all observations in the reference group, assign -1 (must sum to 0)
- for all other groups, assign 0
manual constraints testing:
allow us to test a wide variety of constraints so long as they can be written:
- as a linear combination of a population mean
- the associated weights (coefficients) sum to zero
manual constraints chunk variables together and compare them to other chunks to test if the mean of the chunks are significantly different
rules for assigning weight constraints:
- weights range between 1 and -1
- the group(s) in one chunk get positive weights and the other chunk gets negative weights
- the sum of the weights of comparison must be 0
- if a group is not involved in a comparison its weight is 0
- the weights assigned to the groups in the comparison = 1/number of groups e.g. if there is two groups in the positive chunk they will both be given a weighting of 1/2
- restrict yourself to running k-1 comparisons
- each contrast can only compare 2 chunks of variance
- once a group is singled out, it can not enter the other contrasts
- check if the contrasts are orthogonal
orthogonal contrasts
test independence of sources of variance
- we like manual contrasts to be orthogonal to avoid ‘double dipping’ in groups
for any pair of orthogonal comparisons the sum of the product of the weights will be 0
e.g. do contrast 1 * contrast 2 for each variable, then add up the results these should be 0
non-orthogonal constraints
test non-independent sources of variation - presents some statistical challenges when making inferences
what are emmeans?
estimated marginal means = predicted means from the models
they are used to test model constraints (in R using the contrast function)
interpretating manual constraints results
the estimate is the difference between the group means in each chunk
e.g. add up the group means in each chunk and then do chunk 1 - chunk 2
we then can use p-values or critical values to determine if there is significant difference
factors vs conditions
conditions = part of our experimental design (what we manipulate)
factors = what our conditions become when we put our results into a data set. the levels of a factor are the number of ways we vary/manipulate the condition
one way analysis
in a one way design we only have one condition that is manipulated
main effects
test overall/average effect of a condition (f-test)
contrasts
tests differences between group means (based on coding schemes and associated β coefficients)
simple contrasts/effects
effects of one level on a condition across levels of another
e.g. difference in emmeans for treatment A in hospital 1 and 2
pairwise comparisons
compare all levels of a given predictor
- e.g compare all levels of treatment with all levels of hospital
CREATES STATISICAL ISSUE OF MULTIPLE COMPARISONS (increases our chances of making a type 1 error
multiple tests and type 1 error equations
P(type 1 error) = alpha level
P(not making type 1 error) = 1 - alpha level
where m = number of tests done
P(type 1 error in multiple tests) = (1 - alpha) to the power of m = family-wise error rate
P(not making type 1 error in multiple tests = 1 - (1- alpha) to the power of m
corrections for multiple test errors
to fixt the issue, we either have to make our alpha more conservative or adjust our p-value
Bonferroni correction
considered a conservative adjustment
- treats individual tests within a family as if they’re independent
- equation = alpha/m or p*m
sidak correction
similar to bonferoni
- equation = 1 - (1-alpha) to the power 1/m
scheffe correction
makes broader adjustments
- calculates p-value from the f-distribution
- makes the critical value of F larger for a fixed alpha level, dependent on the number of tests
Tukey’s honest significant differences (HSD) correction
less conservative
- compare all pairwise group means
- each difference is divided by SE of sum of means
- this produces a q statistic that is compared against a studentised range distribution
effects interactions
categorical*categorical interactions with effects coding can also be considered the difference in simple effects
assumption violation: model misspecification
model is not correct
- detected by observing violations of linearity/normality
- solved by including missing terms
assumption violation: non-linear transformations
when data is skewed, we can transform/convert the units to a different scale to make it more normal data and make interpretations easier
assumption violation: generalised linear model
when data is not normal or continuous but transforming would create issues
assumption violation: bootstrapping inference
can help make more reliable inferences even if assumptions are violated
bootstrapping
= the process of resampling with replacement from the original data to generate multiple resamples of the same n as the original data
this means some samples may contain the same participant’s data multiple times
bootstrap distribution
- start with the individual sample size of n
- take k resamples of n and calculate your statistic on each one
- as k gets bigger, the distribution of resamples begins to approximate a sampling distribution
- more samples = more normal looking data
bootstrap SE
SE = sd of bootstrap distribution