FINAL Flashcards
2 higher order theories of consciousness (HOT)
- higher-order perception
- higher-order thought
higher-order perception
idea that consciousness arises when you acknowledge yourself perceiving other things
higher-order thought
idea that humans are aware of thought processes, and this brings about consciousness
3 criticisms for higher order theories
- we are aware of stimuli, not thoughts about stimuli (meta-thought is not necessary for consciousness)
- doesn’t take conscious action into account
- no neuro basis
global workspace theory (GWT)
- input processors compete for attention (similarly to pandemonium -> loudest will enter consciousness/workspace)
- (conscious) global workspace broadcasts input to other brain areas for voluntary action
what 3 things does GWT account for
- intentional action
- information retention
- planning and creative thought
global neural workspace theory (GNW)
used fMRI to map brain areas with different functions, these propagate signals across to motor systems to organize voluntary action
what structure is thought to account for consciousness and why according to GNW
pyramidal neurons because of widespread structure that can connect many areas including PFC and temporal lobes for voluntary action
monism
body and mind made of one substance
dualism
body and mind made of two distinct substances (possibly connect through pineal gland)
functionalism
mental states are constituted only by their functional roles (i.e. accounts for multiple realizability); brain = hardware, mind = software
reductionism
break everything down into parts, no longer discuss the whole
advantage of reductionism
easy to test each individual part
disadvantage of reductionism
some things can’t be broken down
emergence
sum > parts e.g. janitor’s dream
rat memory experiment setup
play a tone, then shock the rat. if they remember, they should be afraid every time they hear the tone.
how to prevent a rat from learning new things
administer drug while / near the time of the initial shock
how to prevent a rat from remembering things
administer drug during recall, it will alter the original memory
can the rat memory experiment be done in humans and is it effective
yes, drug was used with PTSD patients and was somewhat effective, it makes the painful memory less painful
best way to preserve a memory in its original form
don’t recall it
why can we never be 100% certain about material reality (& order of processing pipeline?)
real world -> senses -> processing -> awareness; reality is always filtered through senses and processing therefore never 100% objective
cartesian theatre
example of higher-order perception (perceive yourself perceive something)
problem with cartesian theatre
if homunculus is another person inside your head, what about their consciousness?
Dennett multiple drafts model
different senses have different processing streams where they process things intermittently, and objects can reach ‘fame’
‘fame’ (multiple drafts model) & what is it related to
related to GWT; something that reaches awareness when not being processed, and is broadcast to other brain modules
‘draft’ (multiple drafts model)
there can be multiple version of one stimulus in one stream due to lots of processing
can stimuli continue to be processed after they reach fame
yes
David Eagleman task summary
- monitor motor cortex
- high specificity watch
- any time you feel urge to press button, note the time
- up to 2 seconds before reporting feeling urge, brain has already made the decision for you (without your awareness)
therefore do we have free will?
eagleman democracy of mind
rivalling streams competing for attention (similar to pandemonium
3 features of eagleman democracy of mind
redundancy, competition, emotional + rational processing
redundancy of democracy of mind
each stream is processed by multiple competing systems
competition of democracy of mind
e.g. pandemonium, which stream is loudest
emotional + rational processing of democracy of mind
both kneejerk processing and more rational
what does IIT slicing allow for
seeing which systems are affected lets you determine which systems are dependent on each other vs independent
what is phi value when slicing
number of dependent subsystems
what does a high phi value mean (theoretically)
more consciousness
negatives of IIT (5)
- panpsychism
- hard to calculate
- possible to ‘hack’
- just an opinion
- no definition for consciousness but now we have a number?
structural brain imaging (2)
plus 2 examples
- shows anatomy
- used for tumors, strokes, lesions. etc
- CT and MRI
functional brain imaging (2) plus 3 examples
- shows blood flow / electricity
- used during experiments and diagnosis
- fMRI, PET, EEG
Computed Axial Tomography (CAT / CT) scan
slice by slice through brain (white = bone)
MRI
machine shoots radio waves at tissue, only H responds, creates excitation, H stores info, then measure H activity (white=bone)
CT advantages over MRI (3)
- better spatial res
- cheaper
- faster
MRI advantage over CT
- better contrast
fMRI
(multiple scans)
oxygenated blood -> processing -> deoxygenated blood
areas where blood is being deoxygenated light up (blood oxygen level-dependent (BOLD) signal)
Positron emission tomography (PET) scan
(rainbow scans)inject radioactive glucose, brain activity uses sugar -> see glow more
fMRI advantages over PET (3)
better spatial and temporal res, no radioactivity
PET advantages over fMRI (3)
- faster
- quieter
- cheaper
EEG advantages (5)
- fast
- cheap
- safe
- direct relation to brain activity
- good temporal res
what type of scan do we not have yet
full CNS scan
how do diff types of memory differ (3)
capacity, duration, content
2 types of sensory memory
iconic (visual = 250-200ms) and echoic (auditory = several secs)
why no olfactory, gustatory and tactile sensory memory
difficult experimental protocol
normal capacity of sensory memory
4-5 items e.g. letters
what did Sperling find sensory memory capacity could be extended to with training
9-12 items e.g. letters
what happens to sensory memory with a 1s distractor (masking)
removes majority of it (back down to like 3 items)
what happens to sensory memory with a cue to indicate direction (too short for conscious awareness)
restores performance
how do blinking vs blank screen affect sensory memory
blinking = disrupts performance
blank screen does not
working memory duration without rehearsing
18s
working memory capacity (& how to improve it)
7 +- 2
chunking
3 types of coding for working memory
- acoustic (e.g. get confused because stimuli sound the same)
- semantic (e.g. get confused because categories have similar meanings)
- visual (e.g. rotation tasks are processed degree by degree)
is working memory scanning done in serial or parallel, and how do we know (in lab) (2)
& caveat to these results
- RT linearly correlated to set size (7+-2)
- we don’t terminate as soon as we find the number = exhaustive (not serial)
- perhaps only parallel processing because we want to do well in lab
2 types of LTM
explicit/declarative and implicit
2 types of explicit/declarative memory
semantic and episodic
implicit memory =
procedural
why is assessing LTM capacity hard
must max it out = how?
- hard also because of memory reorganization
LTM duration (3 phases)
- rapid decline over first 3y without reinforcement
- stable at 75% for ~25-30 years
- another decline, possibly due to general cognitive decline
how does better learning/memorization affect LTM duration
higher starting point, but curve stays the same
coding explicit/declarative memory
various locations across cortex (distributed representation)
coding implicit memory
production (e.g. if then rules) in cerebellum
what did Lashley find about LTM location (engram)
takes longer to retrain a rat depending on the size of the brain chunk that was removed, but rat does not forget
equipotentiality (Lashley)
brain areas can take over for each other after damage
Hebb rule
neurons that fire together wire together
long-term potentiation (LTP)
high freq = more receptors develop on receiving neuron
HM (procedure and result)
removal of hippocampus = anterograde amnesia (no new memories), but all memories up to that point are intact
hippocampus function
memory consolidation
why is cognitive science reverse-engineering
we are trying to figure out how an already existing thing works
machine definition
any cause-effect system
4 features of computation (and what is it ultimately)
- rule-based
- shape-based
- implementation-independent (i.e. multiply realizable)
- semantically interpretable
aka symbol manipulation
weak Church/Turing hypothesis
turing machine can compute anything that can be computed by a general-purpose digital computer
strong Church/Turing hypothesis
turing machine can compute anything that can be computed
what is mathematics fundamentally
syntax/semantics (manipulating shapes based on rules)
computationalism (strong AI)
‘cognition is computation’ (not really true)
what premise must be adopted because of computationalism’s definition
that computation is implementation-independent
Searle’s Chinese Room Argument (cognition is not computation)
even if you memorize and execute the whole rule book, you do not understand chinese
Searle’s periscope (the implementation-independence of computation)
no implementation of the rule book leads to understanding mandarin
turing test hierarchy
t1 = toy (regurgitating patterns)
t2 = verbally indistinguishable e.g. chatgpt
t3 = verbal + robotic
t4 = verbal + robotic + neuro
which level of turing hierarchy is disputed by Searle’s Chinese room argument
t2
what level does Harnad think is correct
t3
symbol grounding problem
can look up mandarin symbols indefinitely but if you never have a referential meaning for the symbols the cycle will go on indefinitely
minimal grounding set
the 1000 words that can be used to define every other other word in the dictionary
how does minimal grounding set get its meanings (3)
- direct sensorimotor grounding (DSG) (e.g. trial and error in real world)
- indirect verbal grounding (IVG) (e.g. describing the sensorimotor stuff)
what is a powerful way of grounding new words, but what is the issue with it
language; only works if you understand the words being used
why is computationalism the cogsci dominant theory
because it allows for everything to expressed as an algorithm
how are real neurons similar to fake neurons
lots of inputs converge on a neuron/’black box’ to give 1 output
what is first step after input in artificial neuron
weigh each input
what comes after weighing each input in artificial neuron
sum of all inputs x weights
what comes after the sum of inputs x weights in artificial neuron
bias = add 1 number to every value to simulate base excitation level
what comes after bias in artificial neuron
activation function e.g. threshold
what is the implementation of an artificial neuron
dot product aka matrix multiplication (multiple input, outputs, weights, and biases)
what is input x weight in neuron implementation called
parameter
how mant parameters does chatgpt have
1.76 trillion
what is computer core like intel good for
good for complex math (smart but few)
what is each computer graphic core good for
very basic math (dumb but many)
what is sigmoid function an analogy for in artificial neurons
neuron threshold + maxing out
ReLU (rectified linear unit)
1 to 1 ratio of increase (above 0 on x-axis), output of 0 (below 0 on x-axis)
what do OR, AND, NOT functions have in common
each component just looks at its inputs, then transmits a 0 or 1 signal
perceptron model
same as neuronal model (inputs, weights -> output)
perceptron update rule (2)
small delta = desired output - actual output
big delta = random small value * small delta * input
* must calculate big delta separately for each weight in perceptron model
what does big delta mean
how much you should change each weight in the perceptron model to achieve desired output
why does applying big delta not always lead to exact desired output
because of the small random value (E)
what type of learning is applying big delta
supervised learning
multi-layer perceptron (MLP) / dense neural network
every layer is connected to every element in the next layer
‘deep’ neural network’
contains hidden layers that are not directly trained
NN training (general) (5)
- randomly initialize all weights
- put input through model (feed-forward), receive predicted output
- calculate loss (desired output - predicted output)
- backpropagate loss through network
- update all weights and biases
convolutional kernel
pattern it looks for; allows for feature search over a whole image and avoid localizability
how does convolution work
multiple neurons (e.g. 3x3) for 1 single pixel = convolved feature, repeat for whole image
AlexNet concept
same image can go through multiple convolutional kernels at the same time to look for more than 1 feature
physical symbol system
a set of entities (symbols) that are physical patterns that can occur as components of another type of entity called an expression/symbol structure
what are all rules in the brain theoretically saved as
expressions
how are concepts stored in the brain
combinations of many features
simon’s symbolic model of emotion
(in higher level system) start -> physical symbol system (expression) -> done
what does simon’s symbolic model of emotion not account for (4)
- classification of emotions
- which emotions are more important
- physiological markers
- neuronal basis of emotion
how can simon’s symbolic model. of emotion reprioritize based on emotions / urgency
CNS can cause interruption
how did Ekman find basic emotions
tested Papua New Guinea people of fore culture on emotion identification
ekman’s 6 basic emotions
fear
anger
surprise
sadness
happiness
disgust
2 claims about ekman’s emotions
- distinct emotions with distinct physiological features
- evolutionary functions (hardwired)
2 criticisms on ekman’s basic emotions
- link between emotion and physiological response?
- sociological impact necessary for proper development?
how do emotions differ (2)
duration and role
axes of russel’s circumplex model of emotions
valence (goodness) and arousal (engaging) (2D scale, arranged in a circle)
Adolph & Anderson modern alternative to emotions
7 dimensions to create interspecies framework of emotions
7 dimensions of Adolph & Anderson emotion framework
- scalability: varying intensity
- valence: pleasantness
- persistence: outlast stimulus
- generalization: specificity to stimulus
- global coordination: engage whole organism
- automaticity: how challenging to control
- social coordination: social functions
appraisal theory of emotions
emotions lead to change in the perception of the environment (how emotions relate to cognition)
how an emotional episode is created (3)
- triggering stimuli and context interact to form perception
- perception creates somatic and neural responses as well as cognitive evaluation/appraisal and emotional feelings
- leads to behavioural / verbal responses
5 tools for studying emotions
- neural responses
- somatic responses
- affective responses
- genetic tools
- lesion studies (temp and permanent)
3 examples of genetic tools to study emotions
knockout experiments, optogenetics, pharmacogenetics
can we use language without communicating
not really
Paul Watzlawick’s Axioms of communication (5)
- one cannot not communicate
- communication is diff between diff people
- communication is punctuated
- communication involves digital + analogic modalities (verbal + non verbal)
- communication can be symmetrical or complementary
Clark’s language characteristics (5)
- communication
- arbitrary (e.g. why is ‘truck’ a truck)
- structured (syntax rules)
- generative (infinite)
- dynamic (constantly adding new words)
3 types of linguistic representation
auditory, visual, haptic (braille)
language processing pipeline (4)
phonemes, morphemes, syntax and semantics, pragmatics
phonology (3)
sounds of letters, IPA, spectrogram
coarticulation
phonemes modify each other
why is absence of freq on spectrogram not good for identifying word boundaries
occur within words too
temporal induction
strong top-down influence on phoneme perception
2 types of morphemes
stem and bound morphemes
how does auditory word recognition occur
all possible words get activated until stimulus narrows down options
how does written word recognition occur
we don’t read letter by letter, we fixate and process the rest in our periphery
what causes increased word processing time (2)
- diff phonemes e.g. pint vs mint
- double meanings
garden-path sentence
immediate meaning sounds wrong, must reparse and get to different outcome (include syntactic ambiguity)
Chomsky vs Lakoff
chomsky = syntax
lakoff = semantics
how are all sentences represented according to lakoff
pictograms
5 pragmatic features
-assertives
-directives
-commissives (commit to later action)
-expressives (about mental state)
-declaratives
what do decision trees involve
lots of arbitrary biases
2 features of expert systems
explainable, can be hand-crafted
2 disadvantages of expert systems
difficult to handcraft, falls apart with large data/edge cases/nonlinear correlations
what is visual hierarchical processing similar to
CNN processing
3 CNN features
- sparse connectivity (every input does not connect to every output)
- shared weights
- invariance under translation (look for 1 feature across whole image)
autoencoders
encodes itself to learn compressed representation; dense encoded data reconstructed through backpropagation
what do smaller bottlenecks lead to in CNN autoencoders
worse reconstruction
latent space interpolation
reconstructing interpolated vectors allows you to see what comes between the 2 start and end vectors
latent space arithmetic
allows you to generate data that was not part of data set
e.g. smiling woman - neutral woman + neutral man = smiling man
why must we do arithmetic on latent space rather than image itself
image - image could lead to 2 noses, all background space, etc.
why are some word embeddings e.g. dictionary position of a word not useful
because not meaningful
why is latent space arithmetic with words sometimes problematic
can lead to biases
recurring neural nets (RNN)
loop model into itself
what does RNN involve
sliding window (3 words as input -> following word as output)
what do RNNs allow for (in theory)
learn truth value of certain statements
what makes RNNs better
storage
why does storage help RNN
because otherwise separator characters have no data to draw a conclusion from
neural machine translation with encoder and decoder RNN
encode until ‘end token’, then decode and output
2 RNN types
GRU cells and LSTM
GRU cells
gated recurrent unit, only short term memory
LSTM
long short term memory; built in mechanism so you cannot delete stuff from beginning
problem with predictive text
often causes cyclic phrases if only pick #1 option
how to solve predictive text problem
add randomness i.e. pick an option in top 10
does looking at character frequency in isolation lead to language models
no (e.g. sd n oeiam etc.)
bigrams
co-occurences of letters/words (e.g. on inguman ise forenoft etc - not real words but resemble it more)
how to get good predictive text
use word bigrams/trigrams/n-grams to form word sequences
cosine similarity
cosine of angle between 2 vectors = quantifiable similarity between 2 things (dot product between 2 vectors)
person embedding
turn big 5 scores into vectors, fine cosine similarity, if vectors point in roughly same direction, people should get along better
what can be done with vector values (similar to latent space arithmetic)
make a slider out of each value in a vector, see what meaning each specific number corresponds to e.g. square 4 = personhood
proxy tasks
know how many of each object there is, ask something like “what is there the highest quantity of”
- teaches system to count and rank things implicitly
CBOW (continuous bag of words)
(proxy task to learn word embeddings)
- ask system to fill in blank (implicitly teaches system similarities between concepts e.g. king and queen can both sit on throne)
what type of learning are proxy tasks
supervised
GPT context window
number of words a transformer looks at before giving output; the more words you look at, the more contextual questions you can answer
what does chatgpt use instead of word embeddings
tokens (allows for certain characters to be grouped together therefore system can run better)
what must transformers learn
correspondence
key-value storage
key = concept
value = explanation
query = question about a concept (answer should be value)
how to execute key-value queries (6)
- calculate word embedding (e.g. make a vector for word ‘is’)
- turn vector into 3 vectors: query, key, value
- make key and value vectors for ‘sky’ and ‘the’
- run query of ‘is’ against its own key
- also run the query of ‘is’ against keys of ‘the’ and ‘sky’ (cosine similarity test)
- highest cosine similarity will be the value (in this case, probably sky; value of sky=blue therefore output blue)
narrow/weak AI
good at one task (possibly better than humans)
artificial general intelligence
good at all tasks, can learn new tasks, can learn anything a human can and do it better
why do datasets matter for transformers
can learn different things from different datasets
problems w datasets
biases
reinforcement learning from human feedback
add training data into the model (feedback on how good response was)
problem w reinforcement learning from human feedback
sparse reward