Lectures Flashcards

Question

internal versus external theories of cognition

Answer 1

internal: involves attending internally to thoughts, memories and mental imagery external: involves attending to stimuli in the external environment brain, body, environment

Answer 2

long term memory splits into: explicit/declarative (conscious) and implicit (unconscious) explicit/declarative splits into: semantic (events, experiences) and episodic memory (facts, concepts) implicit splits into: priming and procedural memory (skills, tasks)

Answer 3

1. semantic memory (events, experiences) 2. episodic memory (facts, concepts)

Answer 4

1. priming 2. procedural memory (skills, tasks)

Answer 5

refers to what you know events, experiences

Answer 6

not necessarily tied to language, but intimately connected language is a general organizing principle of memory

Answer 7

memory of word meanings

Answer 8

storage and retrieval

Answer 9

based in experience environment serves as model/constraints

Answer 10

1. grounded/embodied theories - our perceptual world (and our brains, which are embodied) is used as our main info source to understand the world around us 2. text-based machine learning

Answer 11

language processing emotional regulation executive functioning planning organizing memory impulse control problem solving selective focus decision making behavioural control

Answer 12

episodic memory (involved in comprehension, storage and retrieval of memory) hearing ability - first area that processes speech info, turns it into a linguistic code memory acquisition some visual perceptions categorization of objects comprehension memory retrieval

Answer 13

area of brain responsible for language composed of: - primary auditory cortex - wernicke's area - angular gyrus - arcuate fasciculus - primary motor cortex - broca's area

Answer 14

constructs rep of meaning for linguistic info damage from stroke to this area = fluent/receptive aphasia - loss of ability to understand and create meaningful language - grammatically correct but incorrect meaning

Answer 15

responsible for linguistic production damage from stroke to this area = non-fluent/productive aphasia - loss of ability to produce fluent language - but can still understand language

Answer 16

posterior temporal lobe many connections to primary auditory cortex heavily connected to Broca's area

Answer 17

storage and retrieval of word representations, meanings, grammar

Answer 18

posterior inferior frontal region next to primary motor cortex (responsible for muscles used to produce speech) sometimes called motor speech areea

Answer 19

connection between Wernicke's and Broca's area important for BOTH phonological and lexical-semantic processing

Answer 20

hierarchical networks

Answer 21

Collins & Quillian suggest our info in memory is organized hierarchically - can be repped by a tree - superordinate at the top - as you continue down the network, get more subordinate info

Answer 22

actual instances of a category

Answer 23

the amount of time it takes you to find connections between these properties direct connections will be faster think about it like walking from point to point

Answer 24

living thing - connects to propositions "is" and "can" and then to "grow" and "living" living thing: connects to propositions "is a" and then to either 1. plant 2. animal plant - connects to "is a" 1. tree 2. flower these eventually link into specific examples - pine, oak, rose, daisy

Answer 25

gave people a sentence that was true or false had them say whether it was true or false ie. 'a canary can sing', 'can walk', 'has skin' - looking at properties progressively higher up in the network turns out that increasingly high properties take longer to validate

Answer 26

no, not validated in all categories a good first step, but not exhaustive

Answer 27

1. proposed that items can be repped as a SET OF FEATURES - each concept is described by a set of features that define it 2. meaning can be described as a position in a geometric space - vectors

Answer 28

look at how similar and different certain vectors are use trigonometry to calculate the angles between different vectors once you have the numerical similarity between the vectors, you can plot how they are distributed in space

Answer 29

calculated using trigonometry that examines angles between different vectors will come up with value between 1 and -1 1 = the same (very similar) -1 = opposite

Answer 30

uses the vector cosines to place words in a 2D space visually shows their similarity more similar items will be closer to each other within the space helps visualize how we connect things in our minds

Answer 31

classical approaches propose that they are properties of categories ie. features of cars: "has wheels", "used for transportation", "has doors", "has an engine"

Answer 32

multidimensional scaling don't carry interpretable features the locations of things in space don't map onto features like "has wheels" or "has a door" can't say that location x in a matrix means that word y has a door

Answer 33

from text not typically based in perceptual environment some are interpretable, others are not

Answer 34

the WIDTH of the network if you damage the network, all information decays together (not like you just lose a chunk of it)

Answer 35

probabilistically places words matched on whether that word has a feature or not ie. probability value that a certain word is a living thing, or is red, or can move etc/ good for information organization, can categorize info well

Answer 36

good at information organization/categorization but aren't really used as a theory of cognition

Answer 37

neural network

Answer 38

based of off interest in how children acquire language take propositions (sentences) give model a sentence, derived from a representation network model give the model a word (canary) and proposition (can) then have an output layer with all sorts of possible options want model to produce certain options, and not produce others ie. want it to produce 'swim', 'grow', 'fly' but not 'swim' if the model gets something wrong, it uses back-propagation to adjust the weights so that next time it's less likely to make the same mistake can do this because it's a supervised network (we know what we want the network to produce, so we know when it's wrong) by end of training cycle, model produces the correct output

Answer 39

collins & quillian: - hierarchical networks rogers & mclelland: - neural networks

Answer 40

we know what we want the network to produce so we know when it is wrong allows for back-propagation/error-driven learning ie. neural networks are supervised

Answer 41

error-driven learning possible in supervised networks when we know the output that we want the model to produce at first, the network will produce "noise" (the wrong things) but since we know what we want it to produce, we can CHANGE THE CONNECTIONS OF THE WEIGHTS so that next time it's incrementally more likely to produce the correct activations do this hundreds of thousands, millions of times eventually the network will produce the right activation

Answer 42

reinforcement learning

Answer 43

reps a diff weight/numerical value which is adjusted depending on how incorrect the network is

Answer 44

low learning rate so that small changes are made to each input means that a lot of learning trials are required generally must be trained multiple times on same corpus

Answer 45

the backward pass

Answer 46

most activated node in the hidden layers

Answer 47

1. localist network: - each node reps only one entity - people tend to think these are neurologically implausible 2. distributed representation: - info is spread across the nodes - instead of being confined to one node - preferred, because more similar to brain's function

Answer 48

results in a kind of black box model what exactly is happening in the hidden layers is unclear can't "get into the head of the model" - can't map it onto what humans do in experimental tasks led to Bayesian models (back-propagation networks that feed into other back-propagation networks...train each layer separately, don't have to go all the way back to the first layer)

Answer 49

Rumelhart, Hinton & Williams (1986)

Answer 50

learning trajectories of children as they acquire language in the beginning, model produces noise (outputs are all equally likely, close together in 2D space) but with training, they begin to split apart and are weighted differently (just like how kids begin to learn words)

Answer 51

closed models: - restricts the model to working with the training materials - assumes all of the knowledge about the world = contained in the training materials - allows for clarity in resulting explanation open models: - uses millions of samples - noise is eventually reduced through greater levels of experience - better than closed

Answer 52

based on the SIMPLIFICATION ASSUMPTION they are closed networks "the more detail we incorporate, the harder the model is to understand" - think of the growing complexity and non-interpretability of chatGPT

Answer 53

linked to closed models suggests when you're training a model you should give it simple training data because complicated materials make it unclear as to whether the model is succeeding/failing because of the quality of the data simple data provides researchers with clarity regarding how good the model was

Answer 54

closed models have low ecological validity not reflective of tasks that humans actually perform language is very noisy, lots of info all the time so using simple training materials doesn't reflect the task that humans face when they're learning

Answer 55

300 propositions = closed model - only takes 300 trials to learn propositions - can cluster info right away - not error-driven - presents sentences as more structured than they are in reality 300 000 propositions = open model - derived from a large corpus of language - takes much longer to train, about 300 000 trials

Answer 56

because the learning corpus and the actual corpus are different (open model) the actual corpus has more noise and nuance therefore takes longer to settle and to produce the correct output because open models learn from actual sentences, it takes more examples of info to come up with the correct structure

Answer 57

people keep building bigger models, competing against each other BERT, RoBERTa, GPT-2, T%, Turing NLG, GPT-3 GPT-3 is winning

Answer 58

natural language processing

Answer 59

not really it contains way more info than the human brain does not really an applicable model with which to assess human cognition

Answer 60

large language model ChatGPT, FaceBook, Google

Answer 61

proposed by Barsalou as a general theory of cognition classic view: amodal symbols in cognition amodal systems have NO CONNECTION to perceptual environment

Answer 62

perceptual environment amodal symbol system transduces a partial perceptual experience into a completely new representation language that is INHERENTLY NON-PERCEPTUAL

Answer 63

1. neurological evidence: - findings show that damage to sensory-motor cortex impairs processing of certain modality-based categories (ie. birds) 2. failure of transduction: - no system can elegantly go from perception to symbols 3. symbol grounding problem: how does the system know what it's computing?

Answer 64

neural representations

Answer 65

not a physical copy of the perceptual experience instead a RECORD OF THE NEURAL ACTIVATION that arises during perception similar to representations of imagery likely stored in CONVERGENCE ZONES: integrate info in sensory-motor maps to represent knowledge never completely transduced, perceptual traces are reconstructured

Answer 66

many diff behaviours are studied 1. word similarity 2. false memory 3. free association 4. semantic priming 5. verbal fluency 6. sentence comprehension 7. discourse comprehension 8. feature judgments

Answer 67

most common type of data used for these models used in model development and model evaluation give people two words and get them to RATE HOW SIMILAR THEY ARE on a scale collect ratings from people and average them compare this number to computational model that's also learning these words

Answer 68

used in more applied situations ie. diagnosing conditions like alzheimer's or schizophrenia give people a category and ask them to generate as many things as possible from that category compare the model's output to output of humans - see if the person fits the model made for a schizophrenic, for example

Answer 69

models can examine how language use changes prior to diagnosis because they're based on data from people in the years leading up to their diagnosis can quantitatively see how their memory systems are changing models = a tool to understand how the mem systems of people with dementia change over time

Answer 70

words are connected within a semantic network (ie. 'release' connects to 'capture 'connects to 'pirate' connects to 'sailor' connects to 'anchor') generate representation of each item based on the nodes they're connected to

Answer 71

from free association data give people a word (like 'car') and get them to generate features associated with these items this is how they generate the semantic networks/network models

Answer 72

issue with network models explains human behaviours using other human behaviours Turk problems arise when the representational input is derived directly from human behavioural data COMPLEXITY OF THE MODEL = HIDDEN WITHIN THE REPRESENTATION

Answer 73

Jones, Hills, Todd

Answer 74

yes! features are the activation values of the hidden ;ayer activation of hidden layer can be used as featural rep of a word

Answer 75

pre-1990's - didn't have large enough language corpora to train models on but with internet, larger texts were gathered 2000's - further movement to digitize existing/old texts large corpora of text brought in a diff domain of modelling COLLECTION OF LARGE TEXT HAS CHANGED HOW WE THINK ABOUT STUDYING LANGUAGE

Answer 76

now possible to PROPOSE LEARNING MECHANISMS and to TRAIN ON REALISTIC DATA model can be "born" into a realistic language environment we gain insights into cognition and language performance by examining how it learns/functions

Answer 77

are powerful natural language processing tools

Answer 78

Herbert Simons said "the apparent complexity of our behaviour over time is largely a reflection of the complexity of the environment in which we find ourselves" behaviour is adaptive: we shape our cognition to the requirements of our environment - cognitive system is built such that we can change our behaviours to match the needs of our environment

Answer 79

quantification of the natural language environment

Answer 80

William Estes stated that theories of behaviour should shift "the burden of explanation from hypothesized processes in the organism to statistical properties of environmental events" saying we should look at how people are learning from the environment/responding to it he was particularly interested in mathematical properties

Answer 81

these types of models learn the meanings of words from the distribution of how they're used in language aka embedding models learn meaning of words from co-occurrence statistics

Answer 82

Landauer & Dumais (1997) Latent Semantic Analysis model Lan and Dum wanted to switch from current algorithms which would simply be cued with specific words and come up with documents with most overlap they wanted a more MEANING-BASED approach - get rid of polysemy effect - introduce recognition of synonymic meanings

Answer 83

1. examining a large corpus of text 2. extracting information about how words are used 3. information is based on frequency usage for particular words 4. build a vector that reps the meaning of the word in terms of its similarity to other words 5. decompose the matrix into smaller number of features

Answer 84

yes, there's no error signal in the model's learning (unlike neural networks) simply accumulate information in memory and use that to drive the model not using predictive process to hone the model's learning - it treats each lexical experience equally

Answer 85

unsupervised just learns the structure of the dataset

Answer 86

1. input - corpus for model to learn 2. processing - learning algorithm - by which info is gleaned from input, extracted and stored in memory 3. memory - feature space - representation of where we keep info about the word's meaning 4. output - task problem

Answer 87

neural embedding models take a sentence they sequentially activate each word on its own want the model to predict the words that surround that word in that context predictions = in the output layer see if the predictions are correct back-propagate to increase accuracy

Answer 88

it's too complex too many layers - we don't really know what's happening it's a "black box"

Answer 89

"you shall know a word by the company it keeps"

Answer 90

many different possibilities paragraphs, documents, books, authors etc.

Answer 91

pre-processing modifies the sentences/inputs to improve processing 1. stop list 2. subsampling

Answer 92

stop list of high frequency function words any word included on the stop list is removed from the sentence

Answer 93

first a frequency distribution is run (custom to the corpus in question) creates a probability distribution - words with super super high frequencies are skipped

Answer 94

the model is quickly overwhelmed every single word will be understood to be similar to "the"

Answer 95

yes eye tracking studies show that when people read a page, they generally skip function words

Answer 96

subsampling gives you more control over what the model is processing and it's controlled by parameters more training flexibility

Answer 97

if the solvent is insoluble the mixture can be decanted solvent insoluble mixture decanted

Answer 98

the remaining words are examined specifically their occurrences with each other word in the corpus each pair that is found modifies the count in the matrix (strength increases with each pair found) done word by word: find all the pairs for one word first, then move onto the next word...

Answer 99

similarity between words

Answer 100

cosine use a vector cosine: gives value between 1 (very similar) and -1 (no similarity) value represents placement of the vectors in a 2D space highly aligned in terms of featural reps = high similarity value

Answer 101

we examine its performance with a word similarity task: - get people to rate how similar a pair of words are on a scale - get a set of values pertaining to the relation of words - TAKE COSINE SIMILARITY OF EACH WORD PAIR (between 1 and -1) TAKE CORRELATION between the cosine value the model has produced and the similarity value that people are producing use these values to see how similar the model's and people's results are ideally you want a positive correlation

Lectures Flashcards

(127 cards)