Midlevel vision Flashcards

1
Q

Mid-level Vision

A

Organizes the elements of a visual scene into groups that we can recognize as objects. how do we know which edges go with which object?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Kanizsa figure

A

Illustrates illusory contours, i.e., borders that are perceived even though nothing changes from one side to the other in an image

Visual system makes inferences about contours

Based on our best guess about what is happening in the world (experiences)

Kanizsa figure may be due to occlusion, i.e., when object blocks another (e.g., the white arrow is on top of the vertical lines and circles)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

early theory of structuralism and what breaks this theory

A

Early psychologists (Wundt, Titchener) thought that perceptions could be understood by the analysis of the components (e.g., the sum of colour, orientation)

what we see is the sum of its parts, we see what is there)

According to structuralism, if you break down all the basic components (e.g., color, shape, orientation), you can fully explain what we perceive.

This approach assumes we only perceive what is physically present in the stimulus.

Illusory contours break these rules
Illusory contours are shapes or edges we see, even though no actual line or boundary is present.

Example: The Kanizsa triangle, where we perceive a white triangle on top, even though it’s not really drawn.

🧠 This challenges structuralism because:

There’s no physical edge, color change, or explicit feature at the contour

Yet, we still perceive a coherent shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Gestalt school of thought

A

perceptual whole can be greater than the apparent sum of the parts

  • Created a set of organizing principles (Gestalt grouping rules) that describe the visual system interpretation of the raw image (i.e., which elements of an image will appear to be grouped together)
  • Gestalt rules are useful because they represent regularities in our world (based on our experience and can explain some illusions)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Gestalt Grouping Rules : Texture segmentation

A

carving an image into regions of common texture properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Gestalt Grouping Rules: Similarity

A

image chunks that are similar are more likely to be grouped together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Gestalt Grouping Rules: Good continuation

A

a rule that states that two elements will tend to group together if they seem to lie on the same contour - Elements that are collinear (lying in the same straight line) are most likely on the same contour. Ex: birds beaks crossing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Gestalt Grouping Rules: Similarity

A

image chunks that are similar are more likely to be grouped together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Gestalt Grouping Rules: Proximity

A

items near each other are more likely to group together (group as rows of stars vs columns cause closer as rows)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Gestalt Grouping Rules: Parallelism and symmetry

A

a rule for figure-ground assignment stating that parallel or symmetrical contours are likely to belong to the same object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Camouflage

A

The same gestalt principles are used to help us find objects can be exploited to hide objects

an object’s features group with the environment so as to persuade the observer that the object doesn’t form a perceptual group of its own

Ex: stick bug (similarity - shape + colour is identical to tree), tiger (parallelism: stripes mimic grass)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we separate figure from ground (figure-ground assignment) = 5 ways

A

Process is governed by a collection of principles acting together:

  1. Surroundedness: if one region is entirely surrounded by another, it is likely that the surrounded region (e.g., the red part is surrounded by yellow) is the figure
  2. Size: the smaller region is likely to be the figure
  3. Symmetry: a symmetrical region is more likely to be seen as a figure
  4. Parallelism: regions with parallel contours are more likely to be seen as figure
  5. Relative motion: how surface details move relative to an edge can help determine which portion is figure and which is ground
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Summarizing Mid-Level Vision

A
  1. Bring together that which should be brought together. We have the Gestalt grouping principles (similarity, proximity, parallelism, symmetry, and so forth), and we have the processes that complete contours and objects even when they are partially hidden behind occluders (e.g., good continuation).
  2. Split asunder that which should be split asunder. Complementing the grouping principles are the edge-finding processes that divide regions from one another. Figure-ground mechanisms separate objects from the background. Texture segmentation processes divide one region from the next on the basis of image statistics. Use our world knowledge to separate object from background
  3. Use what you know. Two-dimensional edge configurations are taken to indicate three-dimensional corners or occlusion borders, and objects are divided into parts on the basis of an implicit knowledge of the physics of image formation. Use physics knowledge (we know things don’t work)
  4. Avoid accidents. Avoid interpretations that require assumptions of highly specific, accidental combinations of features or accidental viewpoints. Avoid misinterpretations -> illusion of woman holding tower of Pisa
  5. Seek consensus and avoid ambiguity. Every image is ambiguous. There are always multiple, even infinite, physical situations that could generate a given image. Using the first four principles, the “committees” of mid-level vision must eliminate all but one of the possibilities, thereby resolving the ambiguity and delivering a single solution to the perceptual problem at hand. Use committees of gestalt principles to find one w/most support and eliminate the wrong ones.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Bayesian approach

A

formal way to model how knowledge about our world can be used to make inferences about what we see (how likely is a hypothesis given what we see)

Bayes’ theorem is a mathematical model that enables us to calculate the probability (P) that the world is in a particular state (A), given a particular observation (O)

P(A|O) = P(A) x P(O|A)/P(O)

Bayesian approach asks us to think about 2 factors:
1. How likely is what you’re proposing (i.e., prior probability)?
2. How consistent is each hypothesis with the observation?

Ex: P its raining given you see storm clouds = (P of rain) x (P of clouds if raining) / P(of clouds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Object Recognition (after perception): pathway

A

There is a progressive change in the responses of cells along the what pathway (Cells in V2 are sensitive to “border ownership” and illusory contours. By V4, cells are interested in more complex attributes.)

As you move forward along the ventral stream:

✅ V1: Basic features
Detects edges, orientation, color, and simple shapes

Small receptive fields

✅ V2: Intermediate features
Responds to border ownership → which side of a border “belongs” to an object

Detects illusory contours (edges that don’t actually exist but are perceived)

✅ V4: Complex features
Sensitive to curves, shapes, patterns, and color constancy

Likely contributes to recognizing object components

✅ IT cortex: Full object recognition
Neurons respond to entire objects, often invariant to size, position, and viewpoint

E.g., some IT neurons respond specifically to faces or hands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Kobatake et al. (1994) Findings

A

Recorded neural activity in monkeys across visual areas

Showed random arrays of stimuli

Found that:
Early areas (like V1) responded to simple features
V4 neurons responded selectively to more complex combinations of features

This suggests V4 is an intermediate step toward full object representation

(There is a progressive change along the what pathway such that cells in more anterior locations are more sensitive to more complex stimuli.
* The stimuli that activate V4 cells may form the basis of object perception)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Subtraction method

A

Researchers show participants visual stimuli (like faces, scenes, or objects) while recording brain activity using fMRI.

They use the subtraction method:
Present two types of stimuli:
-One that includes the mental process of interest (e.g., recognizing a face)
-One that does not (e.g., a scrambled face image)

Subtract brain activity in the control condition from the experimental condition.

✅ The difference reveals brain areas specifically involved in that mental process (e.g., object or face recognition).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Scrambling and Brain Activation

A

Scrambled images: low-level features remain (edges, contrast), but no coherent object
-Increased activity in posterior regions (closer to occipital pole, like V1/V2) → basic visual analysis
Decreased activity in anterior regions (like V4, IT) → no object-level meaning to process

Recognizable (unscrambled) images:
-Trigger stronger activation in anterior brain regions involved in complex object representation

📌 Key idea: As image structure increases (face, scene, body), activation shifts forward along the ventral stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Category-Specific Brain Areas
Fusiform Face Area (FFA)
Parahippocampal Place Area (PPA)
Extrastriate Body Area (EBA)

A

faces
places/scenes
body/body parts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Object-decoding methods in fMRI

A

Training:
o Observer in fMRI scanner is presented with images of many objects
o Researchers record brain responses to these “training” images

Test:
o Present observer with a never-before-seen image
o Researchers use brain activity to try to guess what the object is

→ Shows that info about object identity is present in that region

21
Q

Computational models of visual processing: pandemonium model
“Demons” are roughly analogous to neurons:

steps

A
  1. Signal is detected - capture what we see
  2. “Feature demons”: features are extracted (e.g., oriented lines, curves) - pick features of letter B
  3. “Cognitive demons” looks for specific feature - yell/loud for what’s most similar
  4. “Decision demons” pool info across third layer demons and choose loudest (decide what letter we are seeing based on activity of cognitive demons and how loud they yell)
22
Q

Template

A

internal representation of a stimulus that is used to recognize the stimulus in the world (arial A - we compare every A that we see to this one)

23
Q

Structural description

A

description of an object in terms of the nature of its constituent parts and the relationships between them (anything matching this description will be an A: two lines meet at an angle; a third line spans the angle created between them)

24
Q

Biederman’s recognition-by-components model: Geons

+ problems

A

objects are recognized by the
identities and relationships of their component parts

Geons: the basic building blocks of perceptual objects

Problems with this: not all objects can be created with geons, we need more info to characterize objects (size, texture, colour)

25
Artificial neural networks
inspired by the structure of the brain: They consist of layers of units ("nodes") that mimic neurons. Nodes are interconnected, like axons and synapses in the brain. Learning occurs as the strength of connections changes with experience — similar to synaptic plasticity.
26
Deep Neural Networks (DNNs)
type of ANN with many layers (nodes) — this "depth" allows it to learn very complex patterns. the number of nodes (depth) of a network distinguishes a single neural network from a deep learning mode) Used in AI applications like: Facial recognition Google Home Self-driving cars Medical image analysis (e.g., reading mammograms) Modern advances in computing power and memory allow for deeper networks with millions of parameters. DNNs learn on their own by adjusting connections based on training data — no manual programming required.
27
How DNNs Recognize Objects (Layer-by-Layer Process)
1. Input Layer: Image is fed into the system. 2. First Layer: Extracts basic features (like edges, similar to simple cells in the visual cortex). a set of features is extracted from the image (think: simple cells) 3. Pooling Layer: Combines info to detect patterns (like complex cells). 4. Repeated Layers: Each layer extracts higher-order features, based on the previous one. operations create a new image from which the next layer of the DNN will extract features 5. Final Layer: Contains "neurons" that fire in response to specific object categories (similar to the idea of a "grandmother cell" in the brain).
28
jumbled features
The features of the faces don’t jumble together like the houses do. (we can more easily separate and tell apart overlapping faces from overlapping houses)
29
holistic processing for faces
Instead of analyzing individual parts (eyes, nose, mouth) - feature-by-feature analysis, we process the face as a whole — a single unified representation. This is called holistic processing. It aligns with Gestalt principles, where “the whole is greater than the sum of its parts.” ➡️ Example: You might not notice if someone gets a haircut or has a blemish, because your brain doesn’t fixate on individual features — it sees the overall facial patten
30
What Disrupts Holistic Processing
Face Inversion: Turning a face upside down disrupts holistic processing. You’re forced to look at features individually, like with other objects. This explains the "face inversion effect", where it’s much harder to recognize upside-down faces. Low contrast or unusual lighting can also impair holistic face recognition.
31
Prosopagnosia
Face Blindness A neurological disorder where someone can’t recognize faces — even familiar ones: 🔸 Types: Congenital (developmental): Person is born with it. Brain has normal face-selective regions/face patches, but connections between them are impaired. Acquired: Caused by damage to the temporal lobes, especially the Fusiform Face Area (FFA) in the ventral "what" pathway. People may know they are looking at a face, can detect emotion or gender, but can’t identify the person. Often use voice, clothing, or other cues to recognize others.
32
External or Internal Attention
- ext: music -int: thoughts in head
33
Overt or Covert Attention
-overt: orient sensory receptors to what you want to pay attention to - look at them -covert: not obviously pay attention - eavesdrop
34
Divided attention
driving and having convo, 2 things happening at once so attention is divided, multitasking
35
Sustained attention
focus on activity, reading/sewing.. Task keeps attention for long/sustained time
36
Selective Attention
pay specific attention to one stimulus - cocktail effect
37
Visual Search Experiments
In a visual search task, an observer looks for a target item among distractors. These tasks simulate real-world search behaviors. Typically, the target is present in 50% of trials. Reaction time (RT) is measured as the time it takes to say “yes” (target present) or “no” (target absent). -As set size increases (more items on screen), reaction time increases. -Saying "yes" (target present) is usually faster than "no" (target absent) because: To say “no,” you often need to check every item.
38
How is Search Efficiency measured
Efficiency is measured by the slope of the RT vs. set size graph: Shallow slope = more efficient search Steep slope = less efficient search
39
Efficient Search (Feature Search)
Definition: Target differs from distractors by a single, obvious feature (e.g., color, shape, orientation). Example: Finding a red dot among blue dots. Key Traits: Salient features make the target "pop out" Processed in parallel — the brain checks all items at once Reaction time stays the same, even as set size increases ✅ Highly efficient
40
Inefficient Search : serial self-terminating search (Conjunction or Similar Feature Search)
Definition: Target shares features with distractors (e.g., color and shape) Example: Finding a red circle among red squares and blue circles Key Traits: -Requires examining items one-by-one -Known as a serial self-terminating search Ends when you find the target or finish checking all items Reaction time increases with set size ❌ Less efficient Where’s Waldo: What Type of Search? ✅ Serial self-terminating search -Waldo shares many features with other objects You must scan carefully, item by item — no "pop out" Cannot process all at once = inefficient
41
Guided Search Model
Visual search in the real world is not purely parallel or purely serial — it's guided. Attention is directed to the most likely candidates based on basic features (e.g., color, size, orientation). You don’t check every item blindly — you use what you know about the target to narrow down the search. use basic features to identify - colour) 🧠 Example: Looking for a red apple in a fruit bowl? You can ignore all non-red items.
42
Conjunction Search
arget is defined by a combination of features (not just one). Example: Searching for a black suitcase with a Canadian flag pin — both color and icon matter. Another example: Looking for tomatoes = red + round shape. Conjunction searches are slower than feature searches, but faster than fully serial searches, thanks to feature-based guidance.
43
Visual Priming
Definition: Exposure to a stimulus facilitates faster or easier recognition of that stimulus (or similar ones) later. In visual search, seeing an object once can "prime" your brain to find it more quickly the next time. 🧠 Example: If you spot a monkey puppet in one part of a scene, you'll be faster at spotting other monkey puppets in later parts — even if you're not consciously thinking about it. ✅ Priming improves reaction time and accuracy.
44
Scene-Based Guidance
Your understanding of how typical environments are structured helps guide your attention during search. You use contextual knowledge to focus your search in likely locations. 🧠 Example: If you're looking for a faucet, you’ll naturally search near a sink, not in the middle of the floor. This type of guidance relies on your experience and memory of the real world.
45
The Binding Problem
Our brain processes different features of an object (color, shape, motion, orientation) in separate neural circuits. The challenge: How do we combine ("bind") these features to perceive a single, unified object? Feature Integration Theory (Anne Treisman)
46
Feature Integration Theory (Anne Treisman)
Main Idea: Some features (e.g., color, orientation) are processed automatically and in parallel — even before we focus attention. However, correctly binding features to the right object requires focused attention. 🔄 Two Stages of Processing: Preattentive stage: Fast, parallel, Processes basic features (e.g., “there’s something red”), Doesn’t bind features to objects yet Focused attention stage: Required for binding features into a coherent perception (e.g., “that red object is a tomato”) Binding = conscious attention 🧠 Example: You notice red before you know you’re searching for a tomato.
47
Illusory Conjunctions
Occur when attention is not fully deployed You miscombine features from different objects Example: See a brown “B” and a red “C” but report a red “B” — features were present, but not correctly bound ✅ Evidence that binding needs attention
48
Disorders of Visual Attention (Parietal Lobe Damage) Neglect (usually from right parietal lobe damage):
Contralateral neglect: Ignore one side of space (typically the left side after right hemisphere damage) Line cancellation task: Only cross out lines on the right side Patient may be unaware that the left side even exists
49
Disorders of Visual Attention (Parietal Lobe Damage) Extinction (Milder Form of Neglect):
Patient can detect a stimulus on either side if shown alone But when both sides are stimulated, they fail to notice the contralesional one Competition for attention → stimulus on the damaged side is “extinguished” 👁️ Therapy: Train patients to scan their full visual field