CH. 4. Recognizing Objects Flashcards

1
Q

Recognizing Objects

A

RECOGNITIONIdentification of something or someone from memory.

  • AGNOSIA – the inability to interpret sensations. – thus making the person unable to recognize objects and people – usually the result of brain damage.
    • APPERCEPTIVE AGNOSIA – they seem able to see an object’s shape and color and position, but they can’t put these elements together to perceive the entire object.
    • ASSOCIATIVE AGNOSIA – They can see but cannot link what they see to their basic visual knowledge.
  • STIMULUS INPUTthe thing that you are sensing.
    • Object recognition is influenced by the STIMULUS itself – that is, by the features that are in view.
    • Recognition can take place with incomplete Stimulus Input.
      • EX: if they are partially obscured, warped or from an odd angle.
    • Recognition of various objects is influenced by the CONTEXT in which you encounter those objects.
  • DATA-DRIVEN – Processes that are directly shaped by the stimulus.
    • BOTTOM-UP PROCESSING – Processing that begins with the STIMULUS and works upward until a representation is formed in our mind.
  • CONCEPT-DRIVEN – Processing that is shaped by our knowledge and expectations.
    • TOP-DOWN PROCESSING – Processing that begins with knowledge and expectations to determine what the object in front of us actually is.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Importance of Features

A

FEATURES – are the lines, curves, diagonals that make up every object’s shape. RECOGNITION uses such features as a sort of “Alphabet” for all objects.

  • RECOGNITION may begin by identifying visual features in the input pattern.
    • EX: the vertical lines, curves, diagonals, and so on.
  • With these features, you can start assembling larger, more complex units.
    • EX: A horizontal together with a vertical line gives you a right angle; if you’ve detected four right angles, you know you’re looking at a square.
  • This lines up with the specialized cells in the visual system that act as feature detectors
  • Focusing on FEATURES, therefore, might allow us to concentrate on elements shared by the various A’s (of different fonts) and so might allow us to recognize A’s despite their apparent diversity.

VISUAL SEARCH TASKS – tasks in which participants are asked to examine a display and to judge whether a particular target is present in the display or not.

  • People are generally slower in searching for a target defined as a COMBINATION OF FEATURES. This is just what we would expect if FEATURE analysis is an early step in the analysis of the visual world – and separate from the step in which you combine the features you’ve detected.
    • Further support for these claims comes from studies of brain damage. At the start of the chapter, we mentioned apperceptive agnosia — a disorder that involves an inability to assemble the various aspects of an input into an organized whole.
      • INTEGRATIVE AGNOSIA – derives from damage to the parietal lobe. Patients with this disorder appear relatively normal in tasks requiring them simply to detect features in a display, but they are markedly impaired in tasks that require them to judge how the features are bound together to form complex objects.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Word Recognition

A

WORD RECOGNITION – Because features of letters (curves, lines, dots) form letters and letters for words, and words form sentences, Word recognition is the best place to learn how RECOGNITION works because it reflects how the FEATURES build into complex objects.

  • Object recognition begins with the detection of simple features.
  • Then, separate mechanisms are needed to put the features together.

TACHISTOSCOPE – a device designed to present stimuli for precisely controlled amounts of time. More modern research uses computers, but the brief displays are still called “tachistoscopic presentations.”

  • Used for experiments, each stimulus is followed by a post­stimulus MASK – often, a random pattern of lines and curves, or a random jumble of letters such as “XJDKEL” that masks what had just been flashed on the screen.
    • The mask interrupts any continued processing that participants might try to do for the stimulus just presented.
  • Recognizing these briefly visible stimuli depends on many factors, including:
    • how FAMILIAR a stimulus is (i.e. The higher the FREQUENCY that you come into contact with the stimulus).
      • EX: The more you see a word or object in everyday life, the easier it will be to recognize.
    • the RECENCY of viewing.
      • EX: If you’ve seen a word or object recently, it will be easier to recognize when it appears again.
        • The first exposure PRIMES the participant for the second exposure; more specifically, this is a case of REPETITION PRIMING – presenting a stimulus and then repeating the same stimulus later on.

WORD-SUPERIORITY EFFECT (WSE) – When a letter is easier to recognize when it is in the CONTEXT of a word.

  • Even when a letter is properly printed and quite unambiguous, it’s easier to recognize if it appears within a word than if it appears in isolation.
  • Recognizing an entire word is easier than recognizing isolated letters.

WELL-FORMEDNESS – Letter strings that look like common English words (even if they aren’t) are more easily recognized than letter strings that do not look like English words.

  • EX: “FIKE” and “LAFE.” are well-formed and easily pronounced (and thus easily recognized) while “HZYE” and “SBNE.” are not.
    • This works because strings like these produce a context effect. The context is our familiarity with common English letter strings.
  • How well the letter sequence conforms to the usual spelling patterns of EnglishWELL-FORMEDNESS is a good predictor of word recognition: The more English­like the string is, the easier it will be to recognize that string, and also the greater the context benefit the string will produce.

MAKING ERRORS – Because your brain defaults to the common spelling of English strings, incomplete input of nonsensical strings may be incorrectly interpreted as actual words.

  • EX: “TPUM” is likely to be misread as “TRUM” or even “DRUM.” But the reverse errors are rare: “DRUM” is unlikely to be misread as “TRUM” or “TPUM.”
    • You are using your knowledge of spelling patterns when you look at, and recognize, the words you encounter.
    • Misspelled words, partial words, or nonwords are read in a way that brings them into line with normal spelling
      • In effect, people perceive the input as being more regular than it actually is – meaning our recognition seems to be guided by some knowledge of spelling patterns.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Font

A

FONT – How a letter is stylistically designed.

  • For best reading:
    • Avoid obscure styles (unless you want less-fluent reading!)
    • However, using moderately challenging font can actually help readers to process and remember what you’ve written (e.g., Bodoni MT, printed in 60% grayscale).
  • DESIRABLE DIFFICULTY – memory is promoted by active engagement with the to-be-remembered materials in the learning process. The difficulty is just enough to ENGAGE the reader, raising their attention and focus on the material – as when using the moderately challenging font.
  • ALL CAPS – one of the reasons WHY IT IS MORE DIFFICULT TO READ CAPITALIZED TEXT is because Capitalized words all have the same rectangular shape; gone are the portions of the letter that hang belong the line or that stick up. Essentially, capitalization removes many unique aspects of the letters that otherwise makes them easier to recognize.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Feature Nets and Word Recognition

A

DESIGN of a FEATURE NET – a system for recognizing patterns and shapes consisting of a network of feature detectors, organized in layers.

  • The “bottom” layer is concerned with FEATURES (lines, curves, diagonals), and that is why networks of this sort are often called FEATURE NETS.
    • As we move “upward” in the network, each subsequent layer is concerned with larger­scale objects; using the term we introduced earlier, the flow of information would be BOTTOM-UP – from the lower levels toward the upper levels.
  • Each detector in the network has a particular ACTIVATION LEVEL, which reflects the status of the detector at that moment – roughly, how energized the detector is.
    • When a detector receives some input, its activation level increases.
      • A strong input will increase the activation level by a lot, and so will a series of weaker inputs.
    • In either case, the activation level will eventually reach the detector’s RESPONSE THRESHOLD and at that point, the detector will fire – that is, send its signal to the other detectors to which it is connected.
      • These points parallel our description of neurons.
  • What determines a detector’s starting activation level?
    • Overall, then, ACTIVATION LEVEL is dependent on principles of:
      • RECENCY – detectors that have fired recently will have a higher activation level (think of it as a “warm­up” effect).
      • FREQUENCY​ – detectors that have fired frequently in the past will have a higher activation level (think of it as an “exercise” effect).
  • ​Why are frequent words in the language easier to recognize than rare words?.
    • FREQUENT words, by definition, appear often in the things you read. Therefore, the detectors needed for recognizing these words have been frequently used, so they have relatively high levels of activation. Thus, even a weak signal (e.g., a brief or dim presentation of the word) will bring these detectors to their response threshold and will be enough to make them fire. As a result, the word will be recognized even with a degraded input.
    • REPETITION PRIMING – Presenting a word once will cause the relevant detectors to fire. Once they’ve fired, activation levels will be temporarily lifted (because of recency of use). Therefore, only a weak signal will be needed to make the detectors fire again. As a result, the word will be more easily recognized the second time around.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Feature Net and Well-Formedness

A

FEATURE NET and WELL-FORMEDNESS

The net we’ve described so far cannot, however, explain all of the data.

  • people are able to read letter strings like “PIRT” or “HICE”
  • but not strings like “ITPR” or “HCEI.” How can we explain this finding?
    • One option is to add another layer to the net, a layer filled with detectors for letter combinations.

BIGRAM DETECTORSdetectors for letter combinations.

  • You have never seen the sequence “HICE” before, but you have seen the letter pair HI (in “HIT,” “HIGH,” or “HILL”) and the pair CE (“FACE,” “MICE,” “JUICE”).
    • The detectors for these letter pairs, therefore, have high activation levels at the start, so they don’t need much additional input to reach their threshold. As a result, these detectors will fire with only weak input.
  • None of this is true for “IJPV” or “RSFK.” Because none of these letter combinations are familiar, these strings will receive no benefits from priming. As a result, a strong input will be needed to bring the relevant detectors to the threshold, and so these strings will be recognized only with difficulty.

RECOVERY FROM CONFUSION – Confusion caused by partial information at one level of the feature net can be sorted out at the next level higher using preferences for combinations that are FAMILIAR as a result of FREQUENT exposure (FREQUENCY).

  • EX: Suppose we have partial information at the feature level that allows us to see a “C” and also part of another letter. This leads to confusion at the letter level.
    • Suppose the partial information on the second letter would allow us to detect what could be CO or maybe CU or maybe CQ or maybe CS.
    • Luckily, the confusion is, sorted out at the next level, in this case, the BIGRAM LEVEL
      • That’s because the four detectors don’t all respond in the same way. The “CO” ­detector is well primed (because this is a frequent pattern in English spelling), so the activation it’s receiving will probably be enough to fire this (primed) detector. The “CU­” detector is less primed (because this is a less frequent pattern); the “CQ”­ and “CS” ­detectors, if they even exist, are not primed at all

AMBIGUOUS INPUTS – The same applies to ambiguous inputs as it does to PARTIAL INPUTS.

  • EX: Suppose you are shown a word. It’s either “THE”­ or “TAE” but you’re not sure because the middle letter is strangely drawn and could be perceived as either an “H” or an “A”.
    • Here, the “THE” detector is well-primed because “THE” is so commonly found in the English language. “TAE” on the other hand is rare. And so, even a weak stimulus will cause the “THE” detector to fire. This input is sufficient only for the (well­primed) “THE­” detector, so only it will respond. In this way, the net will recognize the ambiguous pattern as “THE,” not “TAE.”
      • The letter will be better detected in context than in isolation. This isn’t because context enables you to see more; instead, context allows you to make better use of what you see.

RECOGNITION ERRORS – Our bias for FAMIAR patterns can also cause errors – making the input look more regular than it really is.

  • EX: Imagine that we present the string “CQRN” to participants.
    • CQ**RN will lead to the perception of C**ORN.
  • The reason is simply that the detectors for the frequent pattern are well primed and therefore easier to trigger.

DISTRIBUTED KNOWLEDGE – knowledge that’s represented by a pattern of activations that’s distributed across the network and detectable only if we consider how the entire network functions.

  • Preference for one BIGRAM/WORD or another is strictly determined by the fact that some BIGRAMS/letter strings are well-primed and others are not.
  • These detectors don’t “KNOW” or “EXPECT” anything, they are simply well-primed or not. But as a group, they give the appearance of knowledge, though they have none.
    • That’s how “expectations” or “inferences” emerge – as a direct consequence of the activation levels of these detectors.
  • The level of priming that each detector has cannot tell you that a particular letter combination is a frequently seen bigram.
    • Instead, we need to look at the relationship between these priming levels, and we also need to look at how this relationship will lead to one detector being more influential than the other.
      • In this way, the knowledge about bigram frequencies is contained within the network via a DISTRIBUTED REPRESENTATION – it’s knowledge, in other words, that’s represented by a pattern of activations that’s distributed across the network and detectable only if we consider how the entire network functions.
  • The NET appears to make inferences and to know the rules of English spelling. But the actual mechanics of the net involve neither inferences nor knowledge.
    • You and I can see how the inferences unfold by taking a bird’s­eye view and considering how all the detectors work together as a system. But nothing in the net’s functioning depends on the bird’s­eye view. Instead, the activity of each detector is locally determined influenced by just those detectors feeding into it. When all these detectors work together, though, the result is a process that acts as if it knows the rules. But the rules themselves play no role in guiding the network’s moment­-by-moment activities.

EFFICIENCY VERSUS ACCURACY – There is a cost associated with extreme accuracy (time and scrutiny) and so we usually find a balance between efficiency and accuracy, allowing our FEATURE NET to make inferences about the gaps in information.

The answer is straightforward. To maximize accuracy, you could, in principle, scrutinize every character on the page. That way, if a character were missing or misprinted, you would be sure to detect it. But the cost associated with this strategy would be intolerable.

McCLELLAND and RUMELHART MODEL – Says that activation of one detector serves to activate other detectors AND ALSO detectors can inhibit one another so that the activation of one detector can decrease the activation in other detectors.

  • This network is better able to identify well­formed strings than irregular strings.
  • More efficient in identifying characters in context as opposed to characters in isolation.
    • This net makes it possible to accomplish all this without bigram detectors.
    • SEE FIGURE 4.11 below
  • EXCITATORY CONNECTIONS – connections that allow one detector to activate its neighbors.
  • INHIBITORY CONNECTIONS – connections that allow one detector to inhibit its neighbors
  • In the McClelland and Rumelhart model:
    • higher-­level detectors (word detectors) can influence lower-level detectors, and
    • detectors at any level can also influence other detectors at the same level.
      • EX: detection of a T could EXCITE the “TRIP” detector because there is a T in TRIP, but detection of a G would INHIBIT the “TRIP” detector because there is no G in TRIP.
      • The activation of the “TRIP” detector would then INHIBIT the detector for other words as well as for letters that are not T, R, I, or P (which it has actually excited).
      • If, say, TRIP was flashed in front of you, but only the R, I, and P was recognized, those three letters would excite the “TRIP” detector. The “TRIP” detector would in turn excite the T detector. Once the excitation from the “TRIP” detector primes the T­ detector, it’s more likely to fire, even with a weak input.
      • The network, therefore, responds to this suggestion by “preparing itself” for a T.
  • Let’s also note that the two ­way communication that’s in play here fits well with how the nervous system operates:
  • Like visual processing is not a one­way process from the eyes to the brain, detector excitation and inhibition is a 2-way flow
    • Instead, visual signaling occurs in both an ascending (toward the brain) and a descending (away from the brain) direction,

RECOGNITION BY COMPONENTS – we recognize many objects other than print, including the three­ dimensional objects that fill our world.

RECOGNITION BY COMPONENTS (RBC) MODEL** – This model includes an intermediate level of detectors, sensitive to **GEONS

  • GEONS​ – (short for “geometric ions”) basic building blocks of all the objects we recognize – the alphabet from which all objects are constructed.
    • Geons are simple shapes, such as cylinders, cones, and blocks
      • we only need 30 or so different geons to describe every object in the world, just as 26 letters are all we need to spell all the words of English.
  • The RBC MODEL, like the other networks we’ve been discussing, uses a hierarchy of detectors.
    • The lowest ­level detectors are feature detectors,
      • EX: edges, curves, angles, and so on.
    • These detectors in turn activate the GEON detectors.
    • Higher levels of detectors are then sensitive to combinations of geons – more precisely, geons are assembled into complex arrangements called “GEON ASSEMBLY,” which explicitly represent the relations between geons.
      • These assemblies, finally, activate the OBJECT MODEL, a representation of the complete, recognized object.
    • VIEWPOINT INDEPENDENT – geons can be identified from virtually any angle of view.
      • As a result, recognition based on geons is viewpoint-independent.
        • Thus, no matter what your position is relative to a cat, you’ll be able to identify its geons and identify the cat.
        • objects can be recognized from just a few geons.
        • As a consequence, geon­based models like RBC can recognize an object even if many of the object’s geons are hidden from view.

RECOGNITION VIA MULTIPLE VIEWS – This model says that people have a number of different views stored in memory of each object they can recognize.

  • EX: An image of what a cat looks like when viewed head­on, an image of what it looks like from the left, and so on.
    • Thus, you’ll recognize Felix as a cat only if you can match your current view of Felix with one of these remembered views.
  • But the number of views in memory is limited – maybe a half dozen or so – and so, in many cases, your current view won’t line up with any of the available images.
    • In that situation, you’ll need to “rotate” the current view to bring it into alignment with one of the views in memory, and this mental rotation will cause a slight delay in the recognition.
    • This means that the speed of recognition will be viewpoint-dependent, and a evidence confirms this claim.
  • At the top of the hierarchy are detectors that respond to the sight of whole objects.
    • These representations are probably supported by tissue in the inferotemporal cortex, near the terminus of the “what pathway”.
  • Recording from cells in this area has shown that many neurons here seem OBJECT-SPECIFIC – that is, they fire preferentially when a certain type of object is observed.
  • many of these neurons are VIEW-TUNED: They fire most strongly to a particular view of the target object.
    • So some brain tissue is sensitive to a viewpoint, and some brain tissue is not sensitive.
    • PERCEPTION MATTERS:
      • Categorization tasks (“Is this a cup?”) may rely on viewpoint-­independent processing in the brain, while
      • Identification tasks (“Is this the cup I showed you before?”) may rely on viewpoint-­dependent processing
  • Obviously, there is disagreement in this domain. Even so, let’s be clear that all of the available proposals involve the sort of hierarchical network we’ve been discussing.
    • In other words, no matter how the debate about object recognition turns out, it looks like we’re going to need a network model along the lines we’ve considered (i.e. with a hierarchy of detectors)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Face Recognition

A

FACE RECOGNITION – Requires a different approach than object or word recognition. Face recognition is strongly dependent on orientation in ways that other forms of object recognition are not.

  • AGNOSIA — an inability to recognize certain stimuli – due to brain damage.
    • PROSOPAGNOSIA – generally have normal vision. But they can’t recognize individual faces — not even of their own parents or children, whether from photographs or “live.”
      • Implies the existence of special neural structures involved almost exclusively in the recognition and discrimination of faces.
  • SUPER-RECOGNIZERS – and are magnificently accurate in face recognition,
  • Face recognition is strongly dependent on orientation, and so it shows a powerful Inversion effect.
    • ​INVERSION EFFECT – the inability to recognize when it is inverted (up-side-down)
      • With non­faces, the (relatively small) effect of inversion becomes even smaller with practice; with faces, the effect of inversion remains in place even after practice
  • FUSIFORM FACE AREA (FFA) — is specifically responsive to faces. That was at first thought…
    • HOWEVER, it actually seems to be used whenever you are trying to recognize specific individuals within a highly familiar category.
    • EX: In addition to faces, this might include nuanced appearances to similar animals (Warblers) or objects (cars).

HOLISTIC RECOGNITION – Recognition that requires the “Whole” of something in order to recognize.

  • The networks we’ve been considering so far all begin with an analysis of a pattern’s parts (e.g., features, geons); the networks then assemble those parts into larger wholes (Bottom-up processing)
  • Face recognition, in contrast, seems not to depend on an inventory of a face’s parts; instead, this process seems to depend on HOLISTIC PERCEPTION of the face.
    • In other words, face recognition depends on the face’s overall configuration
    • Of course, a face’s features still matter in this holistic process. The key, however, is that It’s the relationships between the features, not the features on their own, that guide face recognition.

BRAIN AREAS CRUCIAL FOR FACE PERCEPTION:

Several brain sites seem to be especially activated when people are looking at faces. These sites include the FUSIFORM FACE AREA (FFA), the OCCIPITAL FACE AREA (OFA), and the SUPERIOR TEMPORAL SUCULUS (fSTS).

  • COMPOSITE EFFECT – in face recognition. The composite-face effect (CFE) shows up when images of two faces are split horizontally and stuck together. It’s easier to identify the top half-face when it’s misaligned with the bottom one than when the two halves are fitted smoothly together.
    • This is because the brain is unable to separate the features of the two halves when they are aligned, thus perceiving the two halves as a whole even when the viewer knows they are images from different faces.
  • This task is difficult if the two halves are properly aligned. In this setting, participants seemed unable to focus only on the top half; instead, they saw the top of the face as part of the whole
    • This task is relatively easy, though, if the halves are misaligned
  • In recognizing FAMILIAR FACES, you rely more heavily on the relationships among the internal features of the face;
  • For UNFAMILIAR FACES, you may be more influenced by the face’s outer parts such as the hair and the overall shape of the head

TOP-DOWN INFLUENCES ON OBJECT RECOGNITIONTOP-DOWN EFFECTS effects driven by your knowledge and expectations.

  • EX: the letter V is easier to recognize in the context “VASE” (A word in our knowledge base) or even the nonsense context “VIMP,” than it is if presented alone.
  • Other top­down effects, however, require a different type of explanation. A long list of cognitive requirements goes into a simple priming effect.
    • Think about what PRIMING involves. Assume someone about to be flashed a word was PRIMED with the hint that it is something you eat.
      • First, the person needs to understand each of the words in the instruction. If she didn’t understand the word “eat” (e.g., if she mistakenly thought we had said, “something that you can beat”), we wouldn’t get the priming.
      • Second, the person must understand the relations among the words in the instruction. For example, if she mistakenly thought we had said, “something that can eat you,” we would expect a very different sort of priming.
      • Third, the person has to know some facts about the world — namely, the kinds of things that can be eaten;
      • Without this knowledge, we would expect no priming.
  • Knowledge that is external to an object influences the process of recognizing that object.
    • In other words, these examples (unlike the ones we considered before) don’t depend just on the specific stimuli you’ve encountered recently or frequently.
  • Instead, what’s crucial for this sort of priming is what you know coming into the experiment, knowledge derived from a wide range of life experiences.

THE FLOW OF TOP-DOWN PROCESSING – Sensibly, top-down processing plays a larger role when bottom-up processing is somehow limited or inadequate.

  • TOP-DOWN PRIMING that draws on knowledge from outside of object recognition itself.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Speed Reading

A

SPEED READING – speed­reading isn’t really “reading faster”; it is instead “reading less and inferring more.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly