Lecture 6 - Computational moduling Flashcards
THINKING STATUS
▪ If we were to build a mind for a machine, would it be an image of a mind or an actual mind?
Would it just be a representation?
▪ How do we know whether a computer can think?18 | P A G E
o According to Turing, asking whether machines can
think, is the wrong question. He backed this up with the
Turing test: if a human cannot tell the difference
between a human and a computer during an imitation
game, then why would we not grant this computer a
thinking status?
▪ Why would we require an entity to be of a specific material
(biological wetware) to grant it thinking status?
o A philosophical riddle: when do we know that someone or something is thinking or
conscious even? Is there a difference between simulating a mind and having a mind.
▪ Counter-argument to the Turing test would be the Chinese
room argument in which someone in a box would get Chinese
input and be able to give Chinese output back based on a
dictionary or set of rules without even understanding a word of
Chinese himself. He would not understand Chinese, he would
just follow rules.
o If the behaviour is similar to humans, are we happy to
think it understands semantics? Searle believes that there is something about the brain
that makes it understand.
MODELLING
▪ “All models are wrong, but some are useful” ~ George Box. Models are abstractions of the
reality, but only through abstraction can we get an understanding of the underlying processes.
Models can be used for several purposes:
o Forcing the scientist to conceptually analyse, specify, and formalise intuitions and
ideas which would otherwise remain implicit or unexamined.
o Useful models show us when/how/why something works, rather than describing
observations about the brain.
o Useful models make predictions about future experiments and data, and do not only
explain existing data (overfitting).
o Models that replicate data in easy, more abstract terms yield a deeper understanding of
the mechanisms under study.
o “What I cannot create I do not understand.” - Richard Feynman
▪ Reasons for computational modelling in neuroscience:
o Theory construction: how, which, and why building blocks are required to recreate
neural phenomena?
o Theory testing: behavioural (spike timing), model fitting, prediction of untested
experimental findings
o Mechanistic understanding: which details matter, which do not?
o Normative modelling: why and what properties emerge from objective functions?
o Data synthesis: connecting the dots across experiments, in vitro, in vivo, etc.
o Implementational robustness: many implementations may give rise to the same
phenomenon
o Exploration: perform otherwise impossible virtual experiments
o Animal welfare: Better experimental planning, more sensible testing, ‘in silico’
electrophysiology
o Practical use: simulate the effects of drugs on neural activity patterns, etc.
▪ What do we model?
o Behaviour: reaction time distributions, behavioural accuracy, behavioural certainty,
response profiles, error patterns19 | P A G E
o Neural data: M/EEG amplitudes and patterns, fMRI amplitudes and patterns, LFPs,
spike density etc.
o Neural phenomena: Test whether similar phenomena as observed in the brain
emerge in model systems. This can be done w/o direct alignment with brain data.
o The better a model’s predictions to unseen data/modalities, the more certain we can be
that it mirrors cortical processes (multiple realizability). We cannot just take any
model to explain brain behaviour. If a complex model’s output behaviour is the same
as the brain, this is not a guarantee that the brain actually works in the same way.
▪ Model fitting: adjusting free parameters to match model predictions to the data.
▪ Modelling gives mechanistic insights into otherwise impossible experiments.
DEEP NEURAL NETWORKS
▪ Deep learning (DL) is one of many machine learning techniques that allow computer systems
(deep neural networks) to learn from experience.
▪ Deep Neural Networks (DNN) learn a mapping from an input space (e.g. an image) to an
output space (e.g. a probability distribution over object categories → is this a dog or a cat) via
a cascade of non-linear transformations.
▪ The output space depends not only on the current input
to the network, but also on the current or previous
inputs, outputs, weights or states of the network. This is
called learning.
o Learning is performed by changing network
weights in order to minimise a pre-defined
error function (normative modelling
determines the error function, what should the
cost of mistakes be?). This can be supervised,
semi-supervised, or unsupervised.
▪ DNNs are multi-layered and have more than one hidden
layer. Cells in a DNN have non-linear mapping as
they’re 0 until a threshold is reached and then an
action potential or a certain firing rate is sent.
▪ DNNs are not dependent on the engineer, since in
classic and hierarchical pattern recognition, the
engineer had to decide what features would make
sense to train the classifier. The DNN has a
trainable classifier, but also trainable features,
which means it is not dependent on the
imagination of the engineer, as it figures out the
features based on the input.
▪ DNNs are frequently used across domains. From
AI to computational neuroscience.
▪ Weight adjustments:
Deviation from target (1-hot) is the cost.
Target: minimise it by adjusting the weights and
biases of all units.
▪ Receptive field sizes and feature complexity
increases with network depth.
▪ Adding multiple hidden layers leads to an increase in
compositionality, fewer parameters and less
overfitting.
▪ From the very beginning, DNNs were inspired by findings from neuroscience.20 | P A G E
▪ Different approaches to deep learning in computational neuroscience:
o Normative approach: train network on an external objective, fix the parameters,
and test for its agreement with the experimental data. These models can be used for
experiments that would be deemed unethical and are image-computable, and thereby
differ markedly from hypothetical models that have no grounding in the real world.
What “should” the brain look like if it were optimised for a given task/objective?
1. Train DNN on a complex task (say object recognition)
2. Estimate internal representations in brain data.
3. Compare the representations with each other
4. Test multiple input statistics, architectures, etc. to gain insight into the
conditions that lead to the emergence of brain-like representations. This way,
we can learn about how/why the brain computes the way it does.
o Direct modelling (aka data-driven modelling): adjust the weights to best match the
experimental data
METHOD 1: REPRESENTATIONAL SIMILARITY ANALYSIS (RSA)
▪ The geometry of the points in a high-dimensional
response pattern space, which are thought to represent
particular stimuli.
▪ Despite there being differences in positions between the
neurons (due to individual differences), the distances will
be the same and we can just draw different separating
hypo-planes (since we perceive the world, the same
way). We can do the exact same inference about the
world, while the neuron population is different.
▪ Using RSA is very elegant, as you can forget about the
exact positions of the neurons, since you only need the
distance between two neurons, which makes the
Representational Distance Matrix (RDM) between e.g.
several people comparable.
▪ Despite perfect class-separability, representational
geometries can differ widely.
▪ RSA provides us with a rich data structure, which can
be compared across methods, species, participants,
levels of explanation!
o Because you can compare a brain RDM with a
model RDM, you can ask how good the
representation in the model is, in explaining or
predicting what the brain will be doing.
▪ You can compare the RDM of the brain with the RDM
of a model by correlating them.
o Later layers of the network give
better predictions of the later
stages of visual processing. If this
would be the case for the earlier
layers, then you should reexamine your model, as it would
likely not be representative of the
processing in IT.21 | P A G E
▪ DNNs are the currently best available, image-computable model of IT representations.
▪ To determine the signal to noise ratio, you can take the average of all your subjects – 1 and
use that to predict your left out subject. This is the lower bound of the noise ceiling.
o The noise ceiling tells you that you’ve reached the sort of conclusion you can draw
from the data. Shifting the noise ceiling up, would require getting better data. Cleaner
data → Higher noise ceilings. You should compare the noise ceiling level to what it is
in the brain.
▪ The brain operates under different constraints. Deeper is not always the best solution.
Recurrence may be more effective.
METHOD 2: ENCODING MODELS
▪ An encoding model is a rebranding of a general linear model (GLM).
o This is a different approach than comparing
geometries (RSA) as you are targeting
individual neurons with this method, and
you’re trying to predict explained variance in
the individual neurons.
o It is learning different weights, for different
units, in order to predict the firing of an
individual neuron. This way you can show an
image, and it will predict for each neuron in the
brain how responsive it will be.
▪ Collect data and divide into training
and validation sets.
▪ Select a feature space and estimate
encoding model for each voxel.
▪ Use the encoding model to predict
responses in the validation data.
o Then you can look at individual voxels, and colour-code them by what type of things
they are selective for.
▪ Predictors do not have to be categorical. Same overall modelling idea, but more complex
models.
▪ DNNs are the only model class that reaches 50% accuracy in both V4, and IT, while being
image-computable, and task trained.
o Models that are better at predicting neural data, are also better at classifying images,
which could suggest that neurons in IT are more geared towards object recognition,
because models that are better at recognizing things are also better at predicting what
these neurons are doing.
▪ Idea: have different labs compete with their deep neural network models in predicting brain
data. There is now a system called Brain-Score by MIT, which tells you how good the model
is at predicting the ventral stream.
▪ To solve object recognition, the network passes information through a sequence of feature
extractions that is comparable to the brain.
METHOD 3: NETWORK READOUT
▪ DNN-generated insight into the feature
complexity of human animacy classification
OVERFITTING AND UNDERFITTING
▪ Encoding models in DNNs often have access to individual network
units. Each unit gets its own parameter while fitting to the data.
This means, encoding models have a lot of flexibility. This
flexibility is making encoding models very powerful, but at the
same time allows them to ft to noise instead of more general, true
patterns. This is called overfitting (and can be seen in the bottom
right picture).
▪ Underfitting would be when the estimation line does not have
enough parameters to fit the data at all (which can be seen in the
top two pictures on the right).
o With too many free parameters, we start fitting the noise,
rather than the signal.
o Overfitting leads to terrible future predictions, as the
prediction line is overtrained on the training data.
o Cross-validation is the solution to knowing whether we
are fitting noise-only.
DNNs FOR DATA MODELLING
▪ Task: find an image-computable network to model
the first 300ms of representational transformations
in the ventral stream.
▪ Dynamic Representational Distance Learning
(dRDL): directly train different network
architectures with brain RDMs as target. Once
trained, test the networks on unseen stimuli and
test their match against unseen brain data.
o This method is beneficial as we can test deep neural
network structures and this enables us to directly
inject neural-data into the networks.
▪ Goal: generate insight into the computational
mechanisms underlying human vision
o Use deep neural networks to predict timevarying representational geometries in
response to unseen stimuli.
▪ Result: Recurrence required to capture the
computations of the human visual system
CRITICISM
- DNNs are nothing but black boxes as DNNs often get far too complex with far too many
parameters to interpret them.
o Dr. Kietzmann disagrees: as you have full access to weights and you can probe the
entire model, as networks are derived from architecture (network structure,
hyperparameters), input statistics (dataset), objective function (e.g.
classification/cross-entropy) and a learning algorithm (e.g. SGD). Experimenters have
FULL CONTROL over every single one of these aspects. - Neural networks are far too simple.
o All levels of looking at brain processes (protons, channels, spikes
etc.) are just different lenses on such processes and no single level
can everything. Also, each level is based on assumptions. The level
of detail is an empirical question.
o DNNs as a functional, task-performing starting point. Explanatory
merit is not gained by biological realism. - Neural networks are far too complex.
o The cerebrum in the brain has over 16 billion neurons. In terms of
abstraction, we are doing quite well. The complexity of the brain
requires many parameters, as too few parameters will not get you anywhere. Models
should be as simple as possible, but not simpler.
▪ Models can have all three critique points: Your units can be far too simple to have any
knowledge of the brain, and you can have millions of such units which makes your model far
too complex and powerful with the wrong type of units. And each of these units is a
parameter, which results in another black box