lecture 6 - experiment meets analysis Flashcards
GLM assumptions
-
linearity: any change in the regressor is associated with a proportional change in the data
–> i.e., there is a linear relationship between regressor and data - normality: residuals are normally distributed
- no multicollinearity: regressors are independent of each other (this assumption of often violated)
- independence: observations and residuals are independent of each other (e.g., different time points)§
- homoscedasticity: the variance of the residuals is constant across all levels of the data (e.g., all time points)
multicollinearity: definition + what is the problem
when two or more predictors in the model are highly correlated, or predictors are linear combinations of other predictors
- correlated regressors explain overlapping variance in the signal
-
model coefficients (β) become unstable
–> i.e., small changes in the data lead to large changes in the coefficients - in case of perfect collinearity, there are infinite solutions to the regression
–> meaning the model can’t uniquely determine the individual contributions of the correlated predictors. -
bouncing beta effect: model coefficients for the same regressor can be strongly positive or strongly negative depending on the coefficients of other regressors
–> the sign and size of the coefficient for a regressor can change dramatically depending on the presence of other correlated regressors in the model. -
coefficients are not reliable, and the resulting model does not generalize to new data
–> most important problem
multicollinearity: how can we quantify the problem
- look at the data: are stimulus features/behavioral variables correlated
–> if yes, this will cause problems for EVERY VOXEL in the brain - look at the covariance structure of the design matrix: high correlations among predictors after HRF convolution could be deleted if they are unnecessary and affect important comparisons
- compute variance inflation factors (VIF): quantifies how much the variance of a regression coefficient increases due to multicollinearity
VIF
quantifies how much variance of a regression coefficient increases due to multicollinearity
- R^2 = variance explained in a predictor by all other predictors in the model
- VIF = 1/(1-R^2)
- VIF = 1, no collinearity
- VIF = 5-10, you are in trouble: 80-90% of your predictor is explained by other predictors
- VIF > 20 = close laptop
‘solving’ multicollinearity
not possible, but you can
1. avoid the problem before it occurs through the experimental design
2. compensate for the problem through analytical strategies
multicollinearity: experimental considerations
- think of the analysis before designing the experiment: determine a priori which factors need to be independent
- orthogonal task designs: e.g., vary each experimental component independently from all others, balance their combination
- separate conditions in time: add inter-trial intervals with litter, separate task phases (e.g., stimuli & button clicks)
- counterbalance trial order: ensure that each condition precedes each other condition equally often (at least randomize order)
- block designs: group together trials of a certain condition to separate them from trials of another condition (unlike event-related designs)m
multicollinearity: analytical considerations
-
reduce model complexity: remove predictors that are not needed
–> rule of thumb: n_regressors < n_datapoints/20 - orthogonalization of regressors: decide which predictor gets credit for explaining overlapping variance
-
regularized regressions (e.g., Ridge regression): penalty term (λ) added to the GLM shrinks coefficients, with larger coefficients being compressed more.
–> λ value needs to be estimated through CV
–> model fits the training data less well, but it generalizes better to new data - dimensionality reduction: find principal components of design matrix and fit those to the data
pro’s and cons for orthogonalization
pro: can be appropriate for covariate regressors of a main regressor
con: can be misleading
–> e.g., difference between model coefficients is ‘not real’ but rather reflects your decision
regularization
- Regularized regression is a statistical method that modifies traditional regression to prevent overfitting, which can occur when a model is too complex. It introduces a penalty term to the loss function that the optimization algorithm seeks to minimize.
- This penalty term typically increases as the absolute value of the coefficients increases, leading to a preference for smaller coefficients overall, which can lead to simpler models that generalize better to new data.
temporal autocorrelation
the signal is correlated with a delayed version of itself, meaning that each value in the time series can be predicted based on the values that came before
–> also known as serial dependence
problem with temporal autocorrelation
observations are not independent
- samples acquired close in time are very similar (e.g., because of the HRF)
- the amount of independent information in the data is reduced
- degrees of freedom are overestimated, leading standard errors to be underestimated
- autocorrelation leads to inflated t-statistic, and to an increase in false positive results
Temporal autocorrelation – How can we quantify the problem?
- compute autocorrelogram
- prewhitening
compute autocorrelogram
An autocorrelogram is a plot that shows the correlation of the time series with itself at different lags.
- correlate time series with a delayed version of itself
- do this for all possible delays
- inspect the resulting curve (i.e., the autocorrelogram for all delays)
prewhitening
remove autocorrelation by transforming the data such that the residuals resemble white noise
- fit a GLM model
- compute residual autocorrelation
- correct residual autocorrelation (e.g., through filtering)
- add the uncorrected residuals to the ‘explained (fitted) signal’
- re-run the GLM on corrected data
this improves fMRI reliability
Temporal autocorrelation as a feature, not a bug
Check for autocorrelations in your data. They might speak towards your research question or cause problems (e.g., violating assumptions)