Chapter 2: Perception Flashcards
What are the two most important human perceptual systems?
Auditory & Visual Perception
What % of our brain is devoted to visual processing?
50%
What are the two visual pathways?
“What” Visual pathway & “Where” visual pathway.
Visual Agnosia
An inability to recognize visual objects that results from damage to certain brain regions. (not blind)
Apperceptive Agnosia
A form of visual agnosia marked by the inability to recognize simple shapes such as circles and triangles.
Generally believed to have problems with early processing of information in the visual system.
Associative Agnosia
A form of visual agnosia marked by the inability to recognize complex objects, but with retention of the ability to recognize simple shapes and to copy drawings of complex objects.
Difficulty with pattern recognition, which happens later on in the visual system.
Visual Perception can be divided into an ___ phase, in which shapes and objects are extracted from the visual scene, and a ___ phase, in which the shapes and objects are recognized.
Early ; Later
Retina
The innermost layer of cells within the eye; it includes the photoreceptor cells, bi-polar cells, and ganglion cells.
What are the two types of photoreceptor cells
Cones & Rods
Cones
Involved in colour vision and high-acuity vision (high resolution).
Rods
Principally responsible for the less acute, black-and-white vision we experience at night. Less light needed to trigger a response.
Fovea
The area of the retina with the greatest concentration of cones and therefore the greatest visual acuity. When we focus on an object, we move our eyes so that the image of the object falls on the fovea.
Periphery vision
Detects global information (Ex., movement).
P cells -> B cells -> G cells (where do their axons lead?)
Photoreceptor cells - Bipolar Cells - Ganglion Cells.
The axons of the ganglion cells leave the eye and form the optic nerve!
How many ganglion cell axons form the optic nerve in each eye?
800,000.
Receptive Field
In vision, the region of the retina from which a cell in the visual system encodes information.
Primary visual Cortex
The first cortical area to receive visual input, organized according to a topographic representation of the visual field.
Primary visual Cortex
The first cortical area to receive visual input, organized according to a topographic representation of the visual field.
There is a double reversal in the visual field
The information presented to the left visual field gets processed in the right hemisphere and vise versa.
The image also gets flipped vertically. Information processed in the upper half of the visual field gets processed in the bottom portion of the primary visual cortex.
“What” visual pathway
A neural pathway carrying visual information from the primary visual cortex to regions of the temporal lobe that are specialized for identifying objects.
“Where” visual pathway
A neural pathway carrying visual information from the primary visual cortex to regions of the parietal lobe that are specialized for representing spatial information and for coordinating vision with action.
What happens when the “what” visual pathway is cut?
Difficulty learning to identify objects.
What happens when the “where” visual pathway is cut?
Difficulty learning to identify specific locations.
Kuffler (1953)
Shows how information is encoded by the ganglion cells in the retina and cells in the lateral geniculate nucleus.
Describe an “on-off” cell.
if light falls on a small region of the retina at the center of the cell’s receptive field, their spontaneous rates of firing will increase. If light falls in the region just around this sensitive center, however, the spontaneous rate of firing will decrease. Light farther from the center elicits no change in the spontaneous firing rate — neither an increase nor a decrease.
Describe an “off-on” cell
Light at the center decreases the spontaneous rate of firing, and light in the surrounding areas increases that rate.
Hubel & Wiesel (1962)
Studied the primary visual field of cats.
Found four patterns in the cortical cells.
Describe cortical cells
Their receptive fields are elongated in shape (contrasted with the circular shape of the on-off/off-on cells.
Edge Detectors
Cells in the visual cortex that respond most to edges in the visual field. Split vertically.
Respond positively to light on one side of the line; negatively to light on the other side of the line.
Bar Detectors
Cells in the visual cortex that respond most to bard in the visual field. Split vertically in three parts.
Bar detectors with a positive center will respond most if the bar of light just covers its center. (positive in middle, negative on edges). This also works in reverse.
Bar & Edge Detector cells are specific with respect to: __, __, & __.
Position, Orientation, and Width.
Hubel & Wiesel (1977)
Visual cortex is divided into 2x2mm regions (called hypercolumns)
Explain how hypercolumns work
NEED HELP!
Other than line orientation, size, and width, what other information does the visual system extract from the visual signal?
Colours of objects and whether they are moving.
Livingstone & Hubel (1988)
Form, colour, and movement are processed separately.
How many visual areas are there?
32.
Feature Map
A representation of the spatial locations of a particular visual feature.
There are separate maps for colour, orientation, and movement. (Ex. Moving red vertical bar: separate feature maps represent its colour as red, its orientation as vertical, and its movement as occurring in that location).
What patterns do cells in the inferior temporal cortex respond to?
Complex patterns (like hands and faces).
What is a fundamental problem in the visual field?
Information is laid out on the retina is a 2-D image and it needs to be constructed into a 3-D image.
What cues does the visual system use to infer distance?
texture gradient, stereopsis, and motion parallax.
(other important cues involve features such as: size, position, and lighting).
Texture gradient
Items that we assume are equal in size and evenly spaced appear to regularly decrease in size and pack more closely together the further you move away.
Ex., Standing on a balcony and looking over a crowd (The Pope example).
Stereopsis
Ability to perceive 3D depth based on the fact that each eye receives a slightly different view of the world.
Ex. 3-D glasses: turning two 2D images into a 3D image.
Motion parallax
Provides information about 3D structure when the observer and / or the objects in a scene are in motion.
Ex: The image of distant objects will move across the observer’s retina more slowly than the images of closer objects.
David Marr (1982): proposed what?
2 1/2 D sketch
2 1/2 D sketch
As proposed by David Marr, a visual representation that identifies where various visual features are located in space relative to the viewer.
3-D model
As proposed by David Marr, a representation of objects in a visual scene.
Object Segmentation
How lines and bars go together to form objects.
Gestalt Principles of Organization
Principles that determine how a scene is organized into components; the principles include:
- Proximity
- Similarity
- Good Continuation
- Closure
- Good form.
Principle of Proximity
Elements close together tend to be grouped together.
Principle of Similarity
Elements that look alike tend to be grouped together.
Principle of Good Continuation
Smoothest flowing line for continuation. No breaks in curvatures or lines that are already running in a particular direction.
Principle of Closure
.
Principle of Good Form
.
Pattern Recognition
Identifying what objects are
Template Matching theory of perception
a retinal image of an object is faithfully transmitted to the brain, and the brain attempts to compare the image directly to various stored patterns, called templates.
What can go wrong with template matching? (When considering letter-matching).
The image could fall on the wrong part of the retina.
The image could be the wrong size.
The image could be in the wrong orientation.
The image might be non-standard (the wrong shape).
Where is template-matching used?
Machine vision.
Brain fMRI imaging
What is a CAPTCHA?
“Completely Automated Public Turing test to tell Computers and Humans Apart.”
Feature Analysis
A theory of pattern recognition that claims that we extract primitive features and then recognize their combinations.
Feature analysis model: advantages over template-matching
- ) Because features are simplier, the system might try to correct for the kinds of difficulties faced by the template-matching model in recognizing full patterns.
- ) Feature analysis makes it possible to specify those relationships among the features that are most important to the pattern.
- ) The use of features rather than larger patterns reduces the number of templates needed. Because the same features tend to occur in many patterns, the number of distinct entities to be represented would be reduced considerably.
Kinney (1966)
Showed how there is behavioural evidence of features used as components in pattern recognition.
*Be able to explain study! The likeness of letters C & G (pg 52).
Psychological nystagmus
very slight eye tremors.
Psychological nystagmus occurs at a rate of:___
30 to 70 cycles / second.
Why is psychological nystagmus important?
It is critical for the perception of whatever it is that we are looking at.
When techniques are used to keep an image in the exact same position on the retina regardless of eye movement, parts of the object start to disappear from our perception.
If the exact same retinal and nervous pathways are used uninterruptedly, they become fatigued and stop responding.
Pritchard (1961)
The HB letter experiment
Stabilized objects disappear slowly over time.
Findings:
- ) Features are the important units in perception.
- ) The remaining features are then combined into recognizable patterns.
Even though our perceptual system may extract features, what we actually perceive are patterns composed from these features.
Deep convolutional networks
Computerized systems typically applied to object recognition tasks (including face recognition), based on layers of successively more complex pattern recognizers.
Explain how deep convolutional networks work
- Image processing starts with a stimulus (pixel representation of an image)
- This is followed by 5 layers of pattern recognition.
- Layer 1 acts similar to bar & edge detectors in the primary visual cortex.
- Layer 5 has elements that respond to more complex patterns, similar to cells in the inferior temporal lobe.
How deep can these deep convolutional networks go?
150+ layers.
Why are shallow deep convolutional networks better?
Appear to have properties like those of the human visual system.
Prosopagnosia
A neurological disorder (damage to the temporal lobe) characterized by the inability to recognize faces.
Fusiform Gyrus
A region in the temporal cortex involved in recognition of complex patterns, such as: faces and words.
- The response is much stronger in the right fusiform gyrus.
- Responds when faces are present in the visual field.
Yin (1969)
People are much better at recognizing faces presented in their upright position Vs. other objects.
-When faces are presented upside down (however), there is a dramatic decrease in recognition. BUT: this is not true of other objects.
Phonemes
The minimal units of speech that can result in a difference in a spoken message.
- Ex. B/A/T. Each letter is a phoneme.
- Letters and phonemes are not always one-to-one. Ex. Knight = n/i/t.
Problems with phoneme recognition
- Difficult because words flow from one to the other.
- Variety among speakers of even the same language. Ex. Women and children have higher pitched voices and men have lower voices.
- Variations among speakers of different languages.
Coarticulation phonemes
When one phoneme flows into the other phonemes in the word. The phonemes overlap.
- The actual sound produced for one phoneme will be determined by the context of the surrounding phonemes.
- Ex. ‘a’ can sound soft or hard depending on where it is in the word and what other phonemes are surrounding it.
Problems with speech recognition
- Patients have lost the ability to recognize speech as a result of injury to the left temporal lobe.
- They could detect other sounds and speak still. Their deficit was specific to speech perception.
The Feature-Analysis of Speech
Among the features of phonemes are: the consonantal feature, voicing, and the place of articulation.
Consonantal feature
A consonant-like quality in a phoneme.
Voicing
A feature of a phoneme produced by vibration of the vocal cords.
For example, the phoneme /z/ in the word zip has voicing, whereas the phoneme /s/ in the word sip does not. (EXPLAIN Pg. 59).
Place of Articulation
The place at which the vocal tract is closed or constricted in the production of a phoneme.
1.) Bi-lateral place of articulation
/p/, /m/, and /w/ are considered to have a bilabial place of articulation because the lips are closed (or constricted, in the case of /w/) while they are being generated.
2.) Labiodental
The phonemes /f/ and /v/ are considered labiodental because the bottom lip is pressed against the front teeth. Two different phonemes are represented by /th/ — one in thy (with voicing) and the other in thigh (without voicing). Both are dental because the tongue presses against the teeth.
3.) Alveolar
The phonemes /t/, /d/, /s/, /z/, /n/, /l/, and /r/ are all alveolar because the tongue presses against the alveolar ridge of the gums just behind the upper front teeth.
4.) Palatal
The phonemes /sh/, /ch/, /j/, and /y/ are all palatal because the tongue presses against the roof of the mouth just behind the alveolar ridge.
5.) Velar
The phonemes /k/ and /g/ are velar because the tongue presses against the soft palate, or velum, in the rear roof of the mouth.
Miller & Nicely (1955)
had participants try to recognize phonemes such as /b/, /d/, /p/, and /t/ by distinguishing between the sounds ba, da, pa, and ta presented in noise.2 Participants exhibited confusion, thinking they had heard one sound in the noise when in reality another sound had been presented. The experimenters were interested in which sounds participants would confuse with which other sounds. Participants most often confused consonants that were distinguished by just a single feature.
Ex. when presented with /p/, participants more often thought that they had heard /t/ than that they had heard /d/. The phoneme /t/ differs from /p/ only in place of articulation, whereas /d/ differs both in place of articulation and in voicing. Similarly, participants presented with /b/ more often thought they heard /p/ (differing only in voicing) than /t/ (differing in both features).
Voiced consonants
/b/, the release of air and the vibration of the vocal cords are nearly simultaneous, and the vocal cord vibration continues into the articulation of the following vowel /a/.
Un-voiced Consonants
In the case of the unvoiced consonant /p/, the release occurs 60 ms before the vibration begins for the vowel.
What are we detecting when we perceive a voiced Vs. unvoiced consonant.
The presence or absence of a 60-ms interval between release and voicing.
- This period of time is referred to as: voice-onset time.
The factor controlling the perception of a phoneme is: __
The delay between the release of air and the vibration of the vocal cords.
Categorical perception
The perception of stimuli as belonging in distinct categories without gradual variation.
Lisker & Abramson (1970). pg 60.
the delay between the release of air and the onset of voicing was varied from −150 ms (voicing occurred 150 ms before release) to +150 ms (voicing occurred 150 ms after release). The participant’s task was to identify which syllables began with /b/ and which with /p/. Figure 2.24 plots the percentage of /b/ identifications and /p/ identifications against voice-onset time. Throughout most of the continuum, participants agreed 100% on what they heard, but there was a sharp switch from /b/ to /p/ at about 25 ms. At a 10-ms voice-onset time, participants were in nearly unanimous agreement that the sound was a /b/; at 60 ms, they were in nearly unanimous agreement that the sound was a /p/. Because of this sharp boundary between identifications of the voiced and unvoiced phonemes, perception of this feature is referred to as categorical.
Discrimination studies of Categorical perception
People are very poor at discriminating between pairs of syllables beginning with /b/ or pairs beginning with /p/ that differ in voice-onset time but are on the same side of the phonemic boundary. However, they are good at discriminating between pairs that have the same difference in voice-onset time when one item of the pair is on the /b/ side of the boundary and the other item is on the /p/ side. It seems that people can identify the phonemic category of a sound but cannot discriminate sounds within that phonemic category. Thus, people are able to discriminate two sounds only if they fall on different sides of a phonemic boundary.
Two views on Categorical Perception
- (Weaker) The weaker view is that we experience stimuli as coming from distinct categories. There seems to be little dispute that the perception of phonemes is categorical in this sense.
- (Stronger) A stronger viewpoint is that we cannot discriminate among stimuli within a category.
Massaro (1992)
There is increased discriminability between categories (acquired distinctiveness) and decreased discriminability within categories (acquired equivalence). But, discriminability within categories is still possible.
Analysis by Synthesis
- We perceive voicing by unconsciously determining how the consonants are spoken.
- We determine how we would generate the speech sounds and that we recognize them in terms of the generation process. Thus, the reason for the categorical discrimination between voiced and unvoiced is that they are generated in distinct ways (i.e., with or without vocal cord vibrations, respectively).
Pisoni (1977) & Kuhl (1987) studies: What was the objective?
Objective: There is evidence that categorical perception is not tied to human processing of language but rather reflects a general property of how certain sounds are perceived.
- Categorical perception depends on neither the signal being speech (Pisoni, 1977) nor the perceiver having a human vocal or auditory system
Pisoni (1977)
Created nonlinguistic tones that had a distinguishing acoustic feature similar to the feature of voice-onset time in voicing — a low-frequency tone that was either simultaneous with a high-frequency tone or lagged it by 60 ms. His participants showed abrupt boundaries.
Kuhl (1987)
Trained chinchillas to discriminate between da (beginning with voiced /d/) and ta (beginning with voiceless /t/). Even though these animals do not have a human vocal tract, they showed the sharp perceptual boundary between these stimuli that humans do.
__ is also used to recognize objects.
Context.
*Perception can proceed successfully when only some of the features are recognized, with context filling in the remaining features.
Top-down processing
Perceptual processing of a stimulus in which information from the general context is used to help recognize the stimulus.
Bottom-up processing
Perceptual processing of a physical stimulus in which information from the stimulus, rather than from the general context, is used to help recognize the stimulus.
Reicher & Wheeler (1969-1970)
Participants were presented very briefly with either a letter (such as D) or a word (such as WORD). Immediately afterward, they were given a pair of alternatives and instructed to report which alternative they had seen. (The initial presentation was sufficiently brief that participants made a good many errors in this identification task.) If they had been shown the letter D, they might be presented with D and K as alternatives. If they had been shown WORD, they might be given WORD and WORK as alternatives. Note that both choices differed only in the letter D or K. Participants were about 10% more accurate in identifying the word than in identifying the letter alone. Thus, they discriminated between D and K better in the context of a word than as letters alone — even though, in a sense, they had to process four times as many letters in the word context. This phenomenon is known as the word superiority effect.
Word Superiority Effect
The superior recognition of letters when the letters are presented in a word context than when they are presented alone.
Massaro argument (1979)
Massaro has argued that the perceptual information provided by the stimulus and the information provided by the context are independent sources of information about the identity of the stimulus, which are combined to provide the best inference about what the stimulus might be.
FLMP(fuzzy logical model of perception) [Massaro]
Massaro’s theory of perception, which proposes that information provided by the stimulus and information provided by the context combine to determine perception.
*STUDY
Phoneme-Restoration Effect
The tendency to hear phonemes that make sense in the speech context even if no such phonemes were spoken.
-Originally demonstrated by Warren (1970).
Warren (1970).
Study
presented participants with sentences such as the following:
It was found that the *eel was on the axle.
It was found that the *eel was on the shoe.
It was found that the *eel was on the orange.
It was found that the *eel was on the table.
In each case, the * denotes a phoneme replaced by a nonspeech sound. For the four sentences above, participants reported hearing wheel, heel, peel, and meal, depending on context. The important feature to note about each of these sentences is that they are identical through the critical word and beyond, up to the last word. The identification of the critical word is determined by what occurs after it — that is, by the last word. Thus, the identification of words often is not instantaneous but can depend on the perception of subsequent words.
McGurk Effect
- Named after Harry McGurk
The effect involves watching the lips of someone making a sound like ga while hearing the sound ba. Depending on various factors such as the quality of the acoustic input, listeners report hearing da (a fusion or compromise perception — this type of compromise is the McGurk effect
Where else (other than in written text or speech) is context used?
Context is also important in visualization as well! Especially when considering the identification of an object.
Change Blindness
The inability to detect a change in a scene when the change matches the context.
Primal Sketch
In Marr’s model, the level of visual processing in which the visual features have been extracted from a stimulus.
- These features are combined with depth information to get a representation of the location of surfaces in space; this is Marr’s 2½-D sketch.
The overall process of perception
Pg 69. Diagram