Week 2: GLM part 1 Flashcards
Once we have collected the data we want to know which areas are active. How do we do this?
Through model-based techniques (e.g., linear regression) and model-free approaches (e.g., PCA)
What general steps do we follow to perform model-based analysis of fMRI data?
1) model the predictors; 2) fit the resulting models (one per predictor) to the data (if we have more than one predictor, sum the two models and fit this sum to the actual data); 3) how good do the models fit (e.g., through t-stat)?
Univariate analysis
We treat each voxel’s time course independently
Hemodynamic response function
A function that represents the change in blood-oxygen levels in response to neural activity
T-statistic definition and formula
Signal-to-noise measure. Formula: beta_hat-beta0 / SE(b_hat)
Contrast testing
Conducting hypothesis testing about our betas (for which we have one model each). Example: we have b1 and b2; we create one model for each, we sum these models and we fit the sum to the actual data. Now, we want to know whether the amplitude of the b1 model is the same or not as the amplitude of b2. In this case, the H0 would be: b1 = b2 and the HA woud be: b1 != b2. Bringing everything to the left of the “=” sign, getting b1-b2 in this case, and calculate the contrast we need to answer the hypothesis. In this case, this would be solving for […] * [b0,b1,b2,b3]= b1-b2, which gives [0, 1, -1, 0].
In which of the following steps do we get the regressors that we will fit to the actual data?
- creating stimulus vector for each stimulus/task
- convolving the vector with the HRF
- fitting the model to the data
At step 2. Convolving the initial vector with the HRF will give us the regressors (Xs) which we will then fit to the actual data
Probability
Expected relative frequency of a particular outcome
Random variable
variable determined by random experiment
Expected value (mean)
Mean of random variable
Variance (of a random variable)
How the values of the random variable are dispersed around the mean
Covariance
How much two random variables vary together
Bias, variance, estimator
How much on average the estimate is correct; the reliability of the estimate; something that estimates a parameter
Regression equals…
…association (not causality!!)
Simple linear regression model
Yi = beta0+beta1Xi + e
The error in linear regression is assumed to have a mean of ….
0 (this means that if we took the mean of all our error terms, e1, e2, e3, …., eN, then the result of the sum would be 0, and then dividing by N would again give 0)
The variance of Yi (Var(Yi)) equals…
sigma^2.
The formula for sigma^2 is …
sum(e^2) / ( # of independent pieces of information - # of parameters in the model, including b0 )
The most used loss function for linear regression
Least squared errors function
Gauss Markov theorem states that…
…assuming the following assumptions of the GLM (linear regression model) are not violated:
1. Linearity: The relationship between X and the mean of Y is linear.
2. Homoscedasticity: The variance of residual is the same for any value of X
3. Independence: Observations are independent of each other (hence randomly sampled)
4. Errors have mean of 0
…then the OLS estimators b0 and b1 are the Best Linear Unbiased Estimators of b0 and b1. The OLS method is therefore used to estimate parameters of a linear regression model (e.g., GLM).
Note: assumptions of BLUE are 1) the model is unbiased (the results will, on average, hit the bull’s eye) and 2. it has the lowest variance among all the unbiased estimators
Diagonal matrix
only have non-0 entries in the diagonal
Identity matrix
has only 0s, except only 1s on the diagonal
Matrix inverse
The inverse of a matrix A is the matrix A^(-1) which, if multiplied by A, yields the identity matrix I
A matrix is invertible only if…
1) it is a square matrix and 2) it has full-rank
A rectangular matrix (B) is invertible only if…
the columns and rows are linearly independent and B * B.T gives a square matrix
Formula for estimating beta_hat
(X.T* X) ^(-1) X.T * Y = beta_hat
X (the design matrix) should have [more/less] … rows than columns
“more”. This means that we should try to have more subjects/observations than parameters
T/F: X and Y must have the same first dimension. Also exaplain your decision.
True; because we need a Yi for each subject/observation, and since the subjects are stored in the rows of X (hence representing the first dimension of X), X and Y should have the first same dimension.
Relation between GLM, OLS and BLUE
The GLM states that y = bX + e. We can estimate the betas using the OLS (ordinary least squares model). The OLS gives us the best linear unbiased estimate IF the conditions specified by the Gauss Markov theorem are met (namely, that X and Y have a linear connection and that the unexplained signal is actually noise with a normal distribution)
In the context of fMRI data analysis, what would a TWO-sided hypothesis test be interested in?
Whether beta is different than 0
In the context of fMRI data analysis, what would a ONE-sided hypothesis test be interested in?
Whether beta is positive or negative
What is the meaning behind the p-value?
Assuming our null hypothesis is true, how likely are we to obtain a value more extreme than our statistic?
Type I and Type II erros
Type I: incorrectly rejecting the H0; Type II: incorrectly rejecting the HA
Definition of “contrast”
The difference between two (groups of) betas
Example: What contrast-vector would you need to test whether beta2 differs significantly from 0 (assuming we have beta0, beta1, beta2 and beta3)?
1) Start with:
H0: b2 = 0
HA: b2 !=0
2) Re-arrange with 0 on the right; in this case, it is already like that, so we have:
[…]* [betas] = beta2
Since we care about beta2, we set all the other entries in the contrast to 0 and only the one for beta2 to 1, so that we can get only its value > the final answer is [0, 1, 0, 0]
What do the betas (beta1, … , betaN) represent?
The average change in the dependent variable for a 1-unit increase in the corresponding predictor (X1, … , XN)
The intercept (b0) represents (without mean centering)…
…the value of Yi (the response) when all the Xs are 0
Some info about mean centering the predictors:
1) it shifts the scale, but retains the unit
2) the slope remains the same, but the interpretation of the intercept changes to being the mean of the dependent variable Yi
We’ve established that the meaning of b0 changes if we mean center then predictors; does the meaning of b1, … , bN change?
No; simply, instead of saying “b1 is how Y changes for a one-unit change in the predictor” we say “b1 is how Y changes for a one-unit change in the mean centered predictor”
The canonical HRF (definition)
A model of the change in blood-oxygenation-level-dependent (BOLD) signal through time in response to neural activation. It represents how we think a voxel is going to respond to a stimulus.
In terms of voxel activity, what does Yi represent?
It represent the time series activity of a single voxel (i). We get one activity point (Yi) per time point (shape of Y).
In the context of fMRI data-analysis, we use the t-statistic to…
…measure how many standard errors the estimated parameter (beta_hat) is away from the beta_0. We want a high t-statistic, because this means that the nominator (beta_hat - beta0) is greater than the SE(beta_hat).
LTI theorem of BOLD signal
Linearity + Time invariance
1) Linearity: if one signal gives x response, then two individual signals next to each other will give 2x response, etc; neuronal signal with twice the magnitude results in BOLD signal twice as large
2) Time-invariance: doesn’t matter when it happens, the response is just shifted
Example of canonical HRF function
double gamma model: the first gamma is about the peak, the second gamma is about the undershoot
Why do we wish to convolve a signal with the HRF at a high temporal resolution?
To avoid including more than one stimulus (lasting less than our sampling value) under one estimation
fMRI is a [continuous/discrete] sampled signal
discrete
Why do we care about resampling the predictor X prior to any analyses
The predictor X and the signal Y must be on the same timescale. If that is not the case, we downsample the predictor to the time scale of the signal
What arguments does the canonical HRF care more about?
Time of repetition; oversampling factor (we first want the HRF on the same ORIGINAL time scale as the predictor; then, once we have the convoluted signal, we can resample it once more to the scale of the signal); length of the HRF
T/F: The HRF is a lot smoother when defined on a less precise time scale
False; The HRF is a lot smoother when defined on a more precise time scale
MSE formula
sum(e^2) / N ; we want a low MSE
R^2 formula (coefficient of determination)
1- [ sum((y - y_hat) ^2) / sum((y - y.mean)^2) ] ; we want a high R^2. It provides information about the goodness of fit of the model.
Temporal basis functions
They model the HRF as a combination of hemodynamic response functions; we convolve the predictors with multiple HRFs. For example, when using the double-gamma basis function, we convolve the predictor with both the original HRF AND with the temporal derivative of the HRF. The cool thing about this is that the temporal derivative can correct lag/onset with more precision than the canonical HRF, while the second derivative (if used) can add precision the width of the BOLD response