Week 11: Language Flashcards

Question

why should we study Speech Perception

Answer 1

* For a long time, a question in speech perception has been: how do we segment speech? * Audio is completely continuous – there are no actual breaks between words when we speak, even though it sounds like there are! * How do we mentally divide up continuous audio into discrete words?

Answer 2

* TRACE is basically the interactive activation model applied to speech perception * Feature layer, phoneme layer, and a word layer * ”Phonemes” refer to the basic sounds in a language * Phonemes and letters are not necessarily the same thing! * Letters can correspond to multiple phonemes * The “c” in count is not the same as the “c” in cylinder * One key difference: * In reading, all of the letters are available simultaneously * In TRACE, the phonemes are activated one at a time as the speech signal is processed

Answer 3

Right context effects (Thompson, 1984) * Many times spoken words have missing phonemes - they are either misheard or not pronounced, but we can understand words just fine! * Gift being pronounced as ift – we likely still hear it as “gift” * Because “ift” occurs after the letter “g”, it implies that what we hear in the present can alter our understanding of the past * How does TRACE explain this? * “Gift” may be the only word that “ift” can activate! * “Gift” becomes activated and feeds back to /g in the phoneme layer Speech segmentation

Answer 4

Influence of semantics on word perception! -Let’s say you hear “The _ing had feathers” -Which word do you think this was? “WING” or “RING”? -People generally think it’s “WING” because it’s semantically consistent with what came after (“feathers”: Szostak & Pitt, 2013) * Another example: ”BIG GIRL” and “BIG EARL” can sound almost identical! How do we hear one but not the other? * Linguistic context! We say “Big girls don’t cry”, not “Big Earls don’t cry!” * TRACE would require some additional semantic layer to further constrain it

Answer 5

* Reading and speech perception can be processed simply as an interaction between bottom-up perception and top-down knowledge * The model can perceive these without using rules and exceptions! And this might be a good thing! * Many linguists often discuss word perception involving rules, e.g., English words tend to end with hard consonants like /k * But there are always exceptions – how does the system know how to manage both rules and the many exceptions that are present?

Answer 6

* This is an extremely old question! Many philosophers have debated whether language is inborn or acquired * BF Skinner in 1957 argued that language learning is learned via operant conditioning * Example: If a child says “Mom can you give me milk?” and receives milk, there is reinforcement of the successful use of language * Repeated reinforcement of successful uses of language lead to its acquisition

Answer 7

* Chomsky, a linguist, wrote a scathing review of Skinner’s book * He argued that it was virtually impossible for language to be learned via operant conditioning * Key problem: the poverty of the stimulus * Translation: Children just don’t exposed to that much language! * Children often produce sentences that they have never even heard before A child saying “I hate you mommy!” * Chomsky argued that language learning is innate and due to a universal grammar * All languages are mapped onto this grammar

Answer 8

* Enormous! * Led to a renewed interest in nativist accounts of language learning (biological preprogram, innate) * Many researchers have documented the extremely rapid rise in language use through the early years * 5 years old learn on average of 2-3,000 words a year – many words a day! * Was also one of the cornerstones of the cognitive revolution in the 1960’s and the death of behaviorism * Behaviorism was entirely about stimulus – response associations * After the cognitive revolution, researchers began considering internal representations as a mediator between stimuli and responses

Answer 9

* Noam Chomsky argued that sentence comprehension is first and foremost dependent on syntax * Syntax: rules for word order * This is another example of a “classical” approach to language comprehension, also referred to as a “structural” approach * Sentences are divided into their parts of speech and grouped into noun phrases and verb phrases -syntax is process first, then meaning of word is use to create the meaning of the sentence

Answer 10

Sentence interpretations cannot always be recovered using rules! * “The spy saw the policeman with binoculars” vs. ”The spy saw the policeman with a revolver” * The first case: the spy had the binoculars, The second case: the policeman had the revolver * But The structure is nearly identical > The word meanings determine the structural interpretation, not the other way around

Answer 11

* In the 1980’s, there were a number of neural network models of language acquisition that were developed. also referred to as connectionist models * The interactive activation model and TRACE are similar, but these models do not learn * No changes in connections between words, letters, or phonemes occur during training * These networks embody the following principles * Learning by the difference between predictions and what was heard * On each iteration, the model makes a prediction of some kind * If the prediction is in error, the connections in the network are modified to better predict future outputs

Answer 12

* Knowledge is distributed across many connections, like in the human brain * Knowledge is not stored in fixed units anywhere like in classical accounts of language This allows for graceful degradation * If you cut certain units or connections in the network, they don’t lose entire words or phrases * Each word is represented across many units, so losing a small number is not consequential * The only learning that takes place is modifications of connections in the network – no new units are added * Connections in the model can be thought of as associations or relationships * The models learn relationships between different levels of language * These can be relationships between the way words appear and how they sound (Plaut et al., 1996) * This can also be relationships between words in a sentence (Elman, 1990)

Answer 13

* The networks do not start with any knowledge! * Models often begin performing very poorly * They learn across many, many iterations – adjusting in response to the errors they make * Performance gradually increases through the course of training until it approximates human performance * Errors made during the course of training are another testbed for such models * These errors should resemble the errors that humans make

Answer 14

* Past tenses are of interest because of irregular verbs * Most verbs are made past tense by adding “-ed” * However, there are several other past tense verbs such as ran and went that don’t conform to this pattern * Even crazier – children often go through a phase where they get worse in their use of irregular past tense forms * E.g., a child will use the word “ran” at around age 3 * Later, the child says “runned”! * Eventually, the child properly uses both regular and irregular verbs * Steven Pinker and others have argued that this is due to the usage of rules * The erroneous “runned” reflects an overusage of the rule * The exceptions to the rule are eventually learned and this allows children to perform well

Answer 15

* Present tense verb is presented to the network on the INPUT layer * Word is converted into “Wickelfeatures” – trigrams of the letters * The word “Foster” broken up into all consecutive three letter combinations: FOS-OST-STE-TER * The Wickelfeatures of the present tense verb are used to produce a past tense version of the word * Converted back into the letter string of the predicted past tense word * How does the model learn to produce past tense verbs? * Connections are present between each layer * When an error is made, the connections are adjusted based on the difference between the current prediction and the correct past tense verb

Answer 16

argued that there is a sufficiency in general learning mechanisms, rather than rules or syntax * Their model – and other PDP models – merely learn on the difference between the correct input and the prediction from the network

Answer 17

* Steven Pinker and colleagues heavily criticized this model on a number of grounds * The model does not succeed on all exception words * There are certain neurological double dissociations that support the idea that verb use is subserved by two systems * “Double dissociations” – manipulation of a variable affects system 1 but not system 2, manipulation of another variable affects system 2 but not system 1 * Patients with Alzheimer’s, who have LTM deficits, have difficulty with irregular past tense verbs * Patients with Parkinson’s disease, who have damage to the basal ganglia, have difficulty with regular past tense verbs but not irregular ones * Severing connections in connectionist models tends to affect the irregular verbs but not the regular ones

Answer 18

* They are sensitive to the training sets Very different behavior emerges from different training sets! * They can learn things people can’t learn The learning algorithms are so powerful they can reproduce just about any patterns with enough training * They are difficult to understand! If you don’t understand how the model worked, you’re not alone They often reproduce patterns of interest after very small incremental adjustments to connections across thousands of iterations of training Often the creators of the models cannot explain how the models succeed

Answer 19

Modern neural network models: deep learning models, which are used by Google and others * Neural network models with many layers (around 10-20 layers) * These models are used a lot for web searches, face recognition, etc * Modern language production models like GPT-3 are extremely impressive, but not without their criticisms

Week 11: Language Flashcards

(43 cards)