Low Level Vision Flashcards

1
Q

How do we get a computer to recognise images?

What is the range of a 3 bit image?
What is the range of an 8 bit image?

A

Each pixel is represented by a value. This is the colour of the pixel.

For 3 bits, this is 8 colours with a range [0-7]
For 8 bits, this is 256 colours with a range [0-255]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is noise in regards to images?

Why does it make computer vision more difficult?

How can we get a clear image?

A

Noise refers to random variations in pixel values that don’t correspond to the true scene being captured.

This arises due to limitations in the imaging sensor, environmental conditions, or errors in data transmission.

Noise makes computer vision tasks more challenging because it obscures the true features of an image.
- it can distort features or alters their appearance

There are methods for de-noising and algorithms that are built to deal with noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some factors that can make computer vision difficult?

A
  • Different viewpoints greatly vary an objects appearance (orientation, rotation, retinal location, scale)
  • Noise
  • Illumination
  • Deformations
  • Small size / Far away objects
  • Occlusion: if part of the object is cut off in the image, hard to label
  • Truncated object: if one object is obstructing another, hard to label
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between image recognition and object detection?

A
  • Image recognition is when an image is matched with a label of what object is featured in the image
  • Object detection is when a bounding box with a label is placed around the object within the image
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is edge detection?

A

Given an image, detect the edge of an object or several objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What helps us to identify the edge of an object?

A

The edge of an object usually has different features to the other regions in the image, for example, different colours or thicknesses which help to differentiate edges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is semantic segmentation?

A

Given an image, it outputs segments with labels at a pixel level. So for each pixel we want to say which category it belongs to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is instance segmentation?

A

Given an image, it outputs segments with labels at a pixel level, but for every instance, so if there’s multiple cows in an image, there is a category for each separate cow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between semantic and instance segmentation?

A

Semantic segmentation groups objects of the same type into the same category, where as instance segmentation separates each object instance into it’s own category/segment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is image retrieval

A

retrieving relevant images from a large dataset based on a query input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is image generation?

A

generating images using image retrieval, image recognition and object segmentation techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe how a pinhole camera works?

A
  • it simulates the function of our eyes: light comes through our pupils from different angles and hits the back of our eye (image plane). The position where it hit the image plane, tells our brain where the object is (distance away from us, position left/right).
  • A pinhole camera allows the light to enter through a pinhole and the light hits the plane at the back. The coordinates on this plane translate to pixels in a 2D image.
  • The 2D image is inverted, size is reduced and there’s no depth information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 4 stages of coordinates?

A

1) Real world coordinates
2) Camera coordinates
3) Image coordinates
4) Pixel coordinates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why do we want to use homogeneous coordinates?

A

Using homogeneous coordinates, we can easily convert many transformations into the form of matrix multiplication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we translate cartesian coordinates to homogeneous coordinates?

A

We add another dimension and give it the value 1:
(x, y) becomes (x, y, 1)
(x, y, z) becomes (x, y, z, 1)
(4, 3) becomes (4, 3, 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we translate homogeneous coordinates to cartesian coordinates?

A

We divide by the last dimension:
(x, y, z, 1) becomes (x/1, y/1, z/1) = (x, y, z)
(x, y, z, w) becomes (x/w, y/w, z/w)
(3, 4, 6, 2) becomes (3/2, 2, 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the 3 steps for translating real world coordinates into pixel coordinates?

A

1) translate real world coordinates into camera coordinates
2) translate camera coordinates into image coordinates
3) translate image coordinates into pixel coordinates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What steps are performed to translate real world coordinates into camera coordinates

A

1) turn real world coordinates into homogeneous coordinates:
(80, 25, 510) becomes (80, 25, 510, 1)
2) Perform matrix multiplication with camera extrinsic matrix (usually identity matrix)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a Camera Extrinsic matrix?

A

It describes the position of the camera’s position and orientation in the image
- In exams it’s usually specified as an identity matrix, but if it isn’t, it will be specified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the usual things we need to calculate to turn camera coordinates into image coordinates and how is the simplified in an exam question?

A

The focal distance f is the distance between the pinhole and the image plane. Using similar triangles, f helps us calculate the real world point x, has been inverted/reduced to on the image plane. The exam question gives us f and we use matrix multiplication (translating the similar triangles equations into matrices) with f to translate the camera coordinate into the image coordinate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What steps are performed to translate camera coordinates into image coordinates?

A

After step 1 we have camera coordinates, we turn these into image coordinates by performing matrix multiplication between two matrices. 1 matrix which is the camera coordinates matrix and the second matrix which is formed of the focal distance:
[ f, 0, 0, 0]
[ 0, f, 0, 0]
[ 0, 0,1, 0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What steps are performed to translate image coordinates into pixel coordinates?

A

After step 2 we have image coordinates, we turn these into pixel coordinates by performing matrix multiplication which represents translating the image plane origin coordinate into the pixel origin coordinate:
You take the image coordinates [x, y, 1] and multiply them with this matrix:
[ 1/dx, 0, u0]
[ 0, 1/dx, v0]
[ 0, 0, 0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the difference between image coordinates and pixel coordinates?

A

Image coordinates are in mm, pixel coordinates have their own scale and also need to be integers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Where is the pixel origin/origin of an image?

A

In the top left corner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What size is the camera extrinsic matrix?

A

4 x 4
if identity matrix:
[ 1, 0, 0, 0]
[ 0, 1, 0, 0]
[ 0, 0,1, 0]
[ 0, 0, 0, 0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

For the sake of computational convenience, what do we usually assume about Zc and Zw and why is this?

A
  • We assume Zc and Zw are equal, so where we see Zc we can substitute in Zw
  • This is because for most cases, Zc and Zw are very close to each other.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What must be true for two matrices to perform matrix multiplication?

A

The number of columns of the first matrix must equal the number of rows of the second matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does the value of each pixel represent?

A

The light intensity at that point

29
Q

When representing an 8-bit image what do 0 and 255 represent?

A

0 represent black and 255 represent white

30
Q

In a 1 bit image, what do 0 and 1 represent

A

0 represents black and 1 represents white

31
Q

What is RGB?

A

Red Green Blue is a colour image representation, which consists of 3 channels.

32
Q

How do you convert an RGB image into a greyscale image?

A
  • You take the weighted average over the three colour channels
  • I(grey) = (rIr + gIg + b*Ib) / r + g + b
33
Q

How do you convert a greyscale image into binary?

A
  • Choose a threshold value
  • If the pixel value is above the threshold, we assign it 1 or white
  • If the pixel value is below the threshold we assign it 0 or black
34
Q

What is shrinking / sub-samping?

A

A method of downsampling
- decreases resolution by removing pixels depending on a pattern
- example: only keep every other row of pixels in an image

35
Q

What is max pooling?

A

Reduces resolution/number of pixels by finding the maximum value for a region and using that to represent the region.

36
Q

What should we be aware of when shrinking/sub-sampling an image?

A

That we will lose some detail

37
Q

What is zooming or up-sampling?

A

Used to increase resolution
- a simple method is approximation by the nearest available pixel or neighbour

38
Q

What is nearest neighbour interpolation?

What is one pro and con about this method?

A

An up-sampling method involving copying the adjacent pixel value from the same colour channel

It’s fast but inaccurate

39
Q

What is bilinear interpolation?

What is one pro and con about this method?

A
  • an up-sampling method involving taking the average value of the nearest two or 4 pixels from the same colour channel
  • It’s fast and accurate in smooth regions
  • inaccurate at edges
40
Q

What are three types of image manipulation?

A
  • Translation
  • Scaling
  • Rotation
41
Q

What is translation and how is it calculated?

  • Calculate the new position of this point:
    P(3, 4) and V(6, 2)
A

Translation is a type of image manipulation that moves a point to another location by adding amounts (usually as vector) to the coordinates of the point
- New point is: (3 + 6, 4 + 2)
= P’(9, 6)

42
Q

What is scaling and how is it calculated:

Scale this point P(3, 4) by a factor of 0.5 and separately by 2

A
  • Scaling is a type of image manipulation that moves points to make things smaller or bigger. If the scale is larger than 1, the object gets bigger, if the scale is smaller than 1, the object gets smaller

P(3, 4) * 0.5 = P’(1.5, 2)
P(3, 4) * 2 = P’(6, 8)

43
Q

What is rotation and how is it calculated?

A

Rotation is a type of image manipulation that moves a point when the image is rotated:

Calculated by using these equations, where a is the angle by which the point(x,y) is rotated:
x’ = xcos(a) - ysin(a)
y’ = xsin(a) - ycos(a)

44
Q

What are cross-correlation and convolution and how are they different?

  • Don’t need to remember formula’s
A

They are both filters that are used to transpose an image. They are represented by formulas.
- The cross correlation filter transposes a pixel to the same location in an image
- The convolution filter transposes a pixel to a position that is rotated 180 degrees in the image.

45
Q

How is cross correlation and convolution values calculated?

A
  • Cross correlation is calculated by first adding padding: then we multiply the corresponding values and add them for a pixel.
  • Convolution is calculated by rotating the filter (NOT THE IMAGE) by 180 degrees first, then multiply and add for all 9 pixels in a grid
46
Q

What must we remember about the convolution used in convolution neural networks?

A

That the formula we use is actually for cross-correlation, we just call it the convolution filter

47
Q

where does indexing start in an image made of pixels?

A

It starts at 0, so the top left corner is (0,0)

48
Q

What is padding?

What does it mean to use padding with zeros as the default?

A
  • Padding is adding values to the outside of the image
  • That when we pad an image, the boarder of padding consists of the value 0.
49
Q

What happens when we perform convolution on an image that doesn’t have padding?

What is the output size if we do a convolution between a 7x7 image and a 3x3 filter/kernel with no padding?

A

It reduces the size of an image:

7-3+1 = 5

so output is 5x5

50
Q

What is box blur?

A

Using a convolution filter where the values are less than 1, this has the affect of blurring the image

51
Q

What are the features of edges?

A
  • depth discontinuity
  • surface colour discontinuity
  • surface normal discontinuity
  • Illumination discontinuity
52
Q

What is the derivative of y = x^2 + x^4
and y = sinx + e^-x

A

dy/dx =2x + 4x^3
dy/dx = cosx -e^-x

53
Q

How do we represent images in a discrete form?

A

Each pixel value is a discrete integer and we calculate the discrete derivative

54
Q

Why is it useful to calculate the discrete derivative in 1D?

A
  • emphasizes areas where the intensity changes abruptly (happens at edges)
  • identifies points where the change is maximal
55
Q

What are the 3 difference filters and what do they represent or find?

A
  • The backward difference filter finds the derivative at a pixel point considering the difference between the pixel point and the one before it
  • the forward difference filter finds the derivative at a pixel point considering the difference between the pixel point and the one ahead of it
  • the central difference filter finds the derivative of a pixel point by considering the difference between the pixel ahead of it and the pixel before it
56
Q

Where is each difference filter commonly used?

A
  • Forward filters are useful to use at the start of an array since the first pixel in an array doesn’t have anything before it
  • Backward filters are useful to use at the end of an array since the last pixel in an array doesn’t have anything following it
  • a central filter is useful to use for any pixel that’s not at the start or end of an array as it has a pixel point before and after it
57
Q

What are the formula’s for the difference filters?

A

Backward: df/dx = f(x) - f(x-1)
Forward: df/dx = f(x) - f(x+1)
Central: df/dx = f(x+1) - f(x-1)

58
Q

What are the equivalent filters/kernels for the 3 difference filters?
Backward: df/dx = f(x) - f(x-1)
Forward: df/dx = f(x) - f(x+1)
Central: df/dx = f(x+1) - f(x-1)

A

Backward: [-1 1 0]
Forward: [0 1 -1]
Central: [-1 0 1]

59
Q

How do we apply 1D difference filters to an object?

Apply a backward difference filter on the following array (ignore border area) [10, 15, 10]

A

We do normal convolution (multiply then add)

Backward filter = [-1 1 0]
- No padding so 10 becomes 0:
- (-110) + (115) = 5
- (-115) + (110) = -5
[0, 5, -5]

60
Q

How do we calculate the derivate for a 2D image?

A

find derivates: gradient vector (df/dx, df/dy) and use to calculate the gradient magnitude and the gradient direction.

  • apply difference filter, horizontally for x axis and vertically for y axis for pixel point.
  • calculate gradient direction
  • calculate gradient magnitude
61
Q

What does applying a filter in the vertical and horizontal directions give us?

A
  • Applying a filter in the vertical direction gives us the 2D derivative for the y-direction
  • Applying a filter in the horizontal direction gives us the 2D derivative for the x-direction
62
Q

How do we know that calculating the derivative shows us if we have a point that’s part of an edge?

A
  • Light intensity changes are reflected in the gradient
  • an edge point corresponds to the extreme of a derivative
63
Q

How do we calculate gradient direction?

A

angle = tan^-1(df/dy / df/dx)

64
Q

How do we calculate edge strength?

A

It’s given by the gradient magnitude
- ||edge strength|| = sqrt( (df/dx)^2 + (df/dy)^2 )

65
Q

For this pixel point, use a backward filter to calculate the gradient direction and magnitude:

[2,3,2]
[3,5,3]
[6,3,2]

A

Backward filter = [-1 1 0]
apply for x-axis:
(-13) + (15) + (03) = 2
apply for y-axis:
(-1
3) + (15) + (03) = 2

direction = tan^-1(2/2) = 45*
magnitude = sqrt(2^2 + 2^2) = 2*root(2)

66
Q

How do you create a histogram from pixel values?

A
  • For each different value we count how many of that value occurs then create a histogram with the value
67
Q

How do you normalise a histogram?

A

By dividing each amount of the different values by the total number of pixels: so if it’s a 5x5 image, we divide each amount by 25

68
Q

What is a low pass filter?

A

A low-pass filter smooths images and removes noise by reducing high frequency information and retaining low frequency information