Everything Flashcards

1
Q

Why CV is hard

A

AI complete, representations, ML, interfacing with plans, signal-to-symbol converter, signals explicitly express very little info for plans, convert to symbols for manipulation, inverse optics, inverse graphics, little cognitive penetrance, CNNs utilise prior knowledge, poverty of signal data wrt bottom up analysis (edges and foxes), need top down prior knowledge model driven vision; face pixel intensity array unrecognisable as 3D plot; poverty of signal dat;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ill posed problems

A

figure ground segmentation; infer 3D arrangement - occlusion; surface properties texture/colour from image stats; volumetric properties from 2D image projections; real time; depth property inference; surface property inference; colour inference invariant wrt illumination; structure from motion – shading, texture shadows; 3D shape from 2D line drawing; pose invariant recognition; understanding objects never seen before; Hadamard: well posed if solution exists; unique; depends continuously on data;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pixel arrays

A
  • CCD dense independent sensor array charge ~ energy
  • CCD local charge coupling; CMOS
  • sensing elements only few microns in width
  • photon flux limits resolution growth via more dense sensors
  • spatial resolution of image determined by sensor density and optical figure of merit of lens
  • luminance resolution is no of distinguishable grey levels
  • LR det by bits per pixel (digitizer) + SNR of CCD array
  • Colour 3 subarrays preceded by RGB filters Bayer pattern twice as many G to reflect cone sensitivity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data in video streams

A
  • Composite video high frequency chrominance burst colour encoding;
  • S-video separate luma/chroma
  • Separate RGB ccs
  • Colour requires less info than luminance – exploited by coding schemes
  • Framegrabber/strobed sampling block high speed ADC discretises video into byte stream sequence of frames
  • NTSC 30 fps interlace of alternate lines 60 fields per second; PAl – 25 fps
  • Vast flood of data in a video stream even without HDTV
  • PAL 11 million pixels/sec … 8 bits per pixel … 264 MB/s - coping with data flux
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Image formats and sampling theory

A
  • Rectangular array of sampled intensities
  • Separate colour planes
  • Redundancy in correlation neighbouring pixels highly compressible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Examples of image formats and encodings

A
  • jpeg controllable Q factor quantised DCT coefficients of tiles frequency dependent depth
  • jpeg2000 better v of jpeg smooth Daubechies wavelets avoid block quantisation artefacts
  • mpeg stream oriented individual frames jpeg equal amount of temporal redundancy removed via inter frame predictive coding interpolation
  • gif sparse binarised images bandwidth limited media
  • png lossless compression
  • tiff tagged image file formats non compressive randomly embedded tags
  • bmp non compressive bit mapped individual pixel values easily extractable
  • Colour spaces are used for colour separation
  • In compressed formats image payload actually in a transform domain so pixel vals obtained by inverse transform
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Information content in an image

A
  • Bit count doesn’t relate to optical properties nor freq analysis
  • Nyquist – highest spatial frequency component of information contained = 1/2 the sampling density of pixel array
  • 640 cols – 320 cycles/image highest spatial frequency components
  • 30 fps – highest temporal frequency is 15 Hz
  • RGB-D sensors capture depth
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Second order pixel statistics to aid segmentation

A
  • low level metrics useful for segmentation

- NIR – compute pixel variance and mean in local patches imaging ratio sets eyelid boundaries on fire

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Neuron properties

A

Neurones are sluggish but richly interconnected cells having both analogue and discrete aspects, with nonlinear, adaptive features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a neurone

A

Fundamentally they consist of an enclosing membrane that can separate electrical charge, so a voltage difference generally exists between the inside and outside of a neurone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Neuronal membrane properties

A

Bilipid layer capacitance of 10K microfarad /cm2+ pores that are differentially selective to different ions Na+, K+, Cl-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Catastrophic breakdown

A
  • Neuronal membrane differentially selective to diff ions
  • Ion species cross through membrane via protein pores (discrete conductances/resistors)
  • Resistors for Na+, K+ are voltage dependent
  • Na+ flow into neurone, voltage becomes more + on inside further reducing membrane resistance to Na+ so more enters
  • Catastrophic breakdown in resistance to Na+ constitutes a nerve impulse
  • Within a msec slower but opposite effect involving K+ restores original transmembrane voltage
  • Refractory period to restore electro osmotic equilibrium after which we’re ready to fire again
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Refractory period duration

A

2 msec
Prevents clocking faster than 300 Hz about 10^6 times slower than PC clock
Balanced by massive interconnectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Nerve impulse propagation speed down axons

A

100 m/sec

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Character of impulse signalling

A

Impulse signalling can be described as discrete, but the antecedent summations of current flows into a neurone from other neurones at synapses, triggering an impulse, are essentially analogue events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Synchrony of neural activity

A

In general, neural activity is fundamentally asynchronous: there is no master clock on whose edges the events occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Brain tissue density

A

10^5 neurones / mm3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Number of synapses per neuron

A

10^3 - 10^4 synapses per neurone to other neurones within ~3cm

19
Q

Processing/communications

A

Not possible to distinguish – because e.g. steerable axonal branches

20
Q

Brain tissue wiring

A

3 km of wiring per mm^3

21
Q

Number of neurones in brain, number of synapses

A

10^11 neurones, making 10^15 synapses

22
Q

Fraction of brain receiving visual input

A

about 60% of the brain receives visual input, we are fundamentally visual creates

at least 30 different visual areas with reciprocal connections

PVC in occipital lobe

23
Q

Retina

A

– extruded ventricle of brain

24
Q

Number of light sensitive photoreceptors, types in retina

A

The retina is about 1 mm thick and it contains about 120 million light-sensitive photoreceptors, of which only 6 million are cones (with photopigments specialised for red, green, or blue wavelengths). The rest are rods which do not discriminate in wavelength bands.

25
Q

Rods

A

Rods are specialised for much lower light intensities. They subserve our “night vision” (hence the absence of perceived colour at night), and they pool their responses, at the cost of spatial resolution.

26
Q

Cones

A

Cones exist primarily near the fovea, in about the central 20 where their responses are not pooled, giving much higher spatial resolution.
I As cones function only at higher light levels, we really have a dual system with two barely overlapping sensitivity ranges.

27
Q

Human vision dynamic intensity range

A

The total dynamic range of human vision (range of light intensities that can be processed) is a staggering 1011 to 1. At the lowest level, we can reliably “see” individual photons (i.e. reliably have a visual sensation when at most a few photons reach the retina in a burst).

28
Q

Distribution of photoreceptors

A

Mainly near fovea

29
Q

Phototransduction and colour separation

A
  • the most distal neurones in the retina are analogue devices
  • photoreceptors don’t generate impulses – but respond to absorption of photons by hyperpolarisation (increased trans-membrane voltage)
  • photon causes photochemical isomerism causing a pore to close to Na+ ions
  • As Na+ ions are actively pumped, this increased resistance causes an increased trans membrane voltage
  • Voltage change is sensed synaptically by bipolar and horizontal cells
  • The tree colour selective classes have cis-retinal embedded in different opsin molecules
  • These quantum mechanically affect the probability of photon capture as a function of the wavelength
30
Q

Photoreceptor sampling arrays

A
  • Rods cons distributed in hexagonal lattices with varying relative densities depending on eccentricity (distance from fovea)
  • Incoherent not crystalline lattice helps prevent aliasing of high resolution information
31
Q

Retina: not a sensor, but a part of the brain

A
  • retina as part of brain
  • 100:1 ratio of receptors to out channels
  • at first synapse retina already performed image processing
  • temporal processing at second synapse
32
Q

Lateral and longitudinal signal flows in the retina

A
  • three nuclear layers
  • two plexiform layers (synaptic interconnections)
  • photoreceptors at rear
  • longitudinal signal flows: photoreceptors -> bipolar cells -> ganglion cells
  • lateral signal flows : horizontal and amacrine -> outer/inner plexiform layers
33
Q

Centre-surround opponent spatial processing in the retina

A
  • I Excitatory and inhibitory spatial structure creates a bandpass filter
  • Such linear filters exist in both polarities: “on-centre” or “off-centre”
  • Activation encoded temporally in frequency domain by frequency of impulses
34
Q

Summary of image processing and coding in the retina

A
  • sampling by photoreceptor arrays
  • pooling of signals from rods
  • both convergence and divergence of signals present
  • bipolar cells perform spatial center surround comparisons (direct central input from photoreceptors minus surround inhibition from horizontal cells in an annular structure having either polarity)
  • Temporal differentiation by amacrine cells for motion detection
  • Separate channels for sustained versus transient image information by different classes of ganglion cells (parvo-cellular, magno-cellular)
  • Initial colour separation by opponent processing mechanisms (yellow v blue, red v green) coupled with spatial center surround structure (in that case double opponency)
  • generation of nerve impulses in a parallel temporal modulation code
35
Q

“Receptive field profile” as an image operator

A
  • receptive field: visual area to which a neurone responds
  • in both space and time, retinal neurones can be described as filters whose response profiles are convolved with the visual input.
  • An important aspect of retinal receptive fields – as distinct from those found in most neurones of the visual cortex – is that their field structure is quite isotropic (circularly symmetric), not oriented.
36
Q

Brain projections and visual cortical architecture

A

I The right and left visual fields project to different brain hemispheres.

two quasi-independent brains

The optic nerve from each eye splits into two at the optic chiasm.

The portion from the nasal half of each retina crosses over to project only to the contralateral (opposite side) brain hemisphere

The optic nerve portion bearing signals from the temporal half of each eye projects only to the ipsilateral (same side) brain hemisphere.

Therefore the left-half of the visual world (relative to gaze fixation) is directly seen only by the right brain, while the right-half of the visual world is directly seen only by the left brain.

Ultimately the two brain hemispheres share all of their information via a massive connecting bundle of 500 million commissural fibres called the corpus callosum.

37
Q

What is the thalamus doing with all that feedback?

A

The projections to each visual cortex first pass to the 6-layered lateral geniculate nucleus (LGN), in a polysensory organ of the midbrain called the thalamus.

this “relay station” actually receives three times more descending (efferent) fibres projecting back down from the cortex, as it gets ascending (afferent) fibres from the eyes

this signal confluence compares cortical feedback representing hypotheses about the visual scene, with the incoming retinal data, in a kind of predictive coding or hypothesis testing operation?

Several scientists have proposed that “vision is graphics” (i.e. what we see is really our own internally generated 3D graphics, modelled to fit the 2D retinal data, with the model testing and updating occuring here in the thalamus via this cortical feedback loop).

38
Q

Interweaving data from the two eyes for stereo vision

A

right eye left eye innervations from each LGN to PVC in occipital lobe of that hemisphere are interwoven into slabs or columns in which neurones receive input primarily from just one of the eyes – R and L eyes alternate

Ocular dominance columns have a cycle of about 1 mm and resemble fingerprints in scale and flow

each hemisphere is trying to integrate together the signals from both eyes in a way suitable for stereoscopic vision by computing relative retinal disparities of corresponding points in the images

these disparities reflect the relative positions of the points in depth

39
Q

New tuning variable in visual cortex: orientation selectivity

A

Orthogonally to the ocular dominance columns in the visual cortical architecture there runs a finer scale sequence of orientation columns

Neurones in each such column respond only to image structures e.g. bars/edges in a certain preferred range of orientations

Firing rates reveal an orientation selective tuning curve

40
Q

Origin of cortical orientation selectivity

A

Orientation selectivity might arise from the alignment of isotropic subunits in the LGN summated together in their projection to the V1

Both on and off polarities exist

Orientation columns form a regular sequence of systematically changing preferred directions

This sequence regularity is one of the most crystalline properties seen in visual cortical architecture

41
Q

Hypercolumns

A
  • 3d block of about 100K cortical V1 neurones that includes one RL cycle of ocular dominance columns
    and (orthogonally organised) about ten orientation columns spanning 360 degrees of their preferred orientations in discrete steps – this is called a hypercolumn

Going down – six layers in which neurones vary mainly in the sizes of their receptive fields

42
Q

Quadrature phase relationships among paired V1 neurones

A

Recording from adjacent pairs of neurones simultaneously using a kind of double barrelled micro-electrode showed that neurones with the same receptive field position orientation preference and size were often in quadrature phase (had a 90 degree spatial phase offset) when responding to a drifting sinusoidal luminance grating

43
Q

Summary of spatial image encoding in primary visual cortex

A
  • Five main degrees of freedom in the spatial structure of cortical receptive field profiles: position in visual space (2 coords), orientation preference, receptive field size, phase (even or odd symmetry)
  • These parameters can be infered from the boundaries between the excitatory and inhibitory regions, usually either bipartite or tripartite.
  • For about 97% of such neurones studied, these receptive field profiles could be well described as 2D Gabor wavelets (or phasors).
  • The differences are statistically insignificant. So, it seems the brain’s visual cortex discovered during its evolution the valuable properties of such 2D wavelets for purposes of image coding and analysis!