Introduction Flashcards by Fabian Bakkum

What is computer vision?

Computer vision is the study that has as goal to create software that can interpret images and give back information about what the image represents.

How well did you know this?

Not at all

Perfectly

What is an image mathematically speaking?

An image is a function I(x,y) that gives back the intensity at position (x,y).

I: R² -> R, where R usually is a value in a discrete range e.g., [0, 255].

How well did you know this?

Not at all

Perfectly

Name at least six uses of computer vision.

Optical Image Recognition (OCR) - Used to read characters from an image
Facial recognition - Used in cameras with smile shutters and to unlock your phone
Object recognition - Detect objects in images, can be used to detect theft in stores
3D modelling - Convert images into 3D models
Motion capture - Project images onto moving entities (Davy Jones in Pirates of the Caribbean)
Structure from motion - Turns a series of 2D images into 3D images
Smart cars - Computer vision is used in car collision detection systems
Sports - Detect who’s first at the finish line
Vision based interaction - Computer vision is used in the Wii controller and in Kinect
Security / Surveillance - Computer vision is used by security cameras to detect thieves
Medical imaging - Computer vision and augmented reality can help surgeons operate

How well did you know this?

Not at all

Perfectly

Define a color image as a vector-valued function.

I(x,y) = [ r(x,y) g(x,y) b(x,y) ]

How well did you know this?

Not at all

Perfectly

What is noise in an image mathematically speaking?

Noise is just like an image, a function, η(x,y). An image with noise can be defined as the sum of noise and the original image.

I’(x,y) = I(x,y) + η(x,y)

How well did you know this?

Not at all

Perfectly

What is salt and pepper noise?

Salt and pepper noise as a function returns black and white pixels at random positions. The noise is sparsely distributed.

How well did you know this?

Not at all

Perfectly

What is impulse noise?

Impulse noise as a function returns white pixels at random positions.

How well did you know this?

Not at all

Perfectly

What is Gaussian noise?

Gaussian noise as a function returns intensities that are picked from a normal distribution.

How well did you know this?

Not at all

Perfectly

What makes computer vision difficult?

Viewpoint - It’s difficult for computer vision algorithms to deal with objects seen from an unfamiliar angle.
Illumination - Light in different levels of brightness and from different angles can complicate the detection of features.
Scale - The distance between the camera and object can make the same object to be of different sizes.
Motion - Moving objects or cameras can complicate matters.
Intra class variation - The object you are looking for may appear in different colors or shapes (there are many car types for example).
Occlusion - There may be objects in front of the objects you are trying to identify.
Background clutter - The object you are looking for might disappear in the background if they look very similar (lack of contrast.
Local ambiguity - A feature may be present in multiple objects in an image.

How well did you know this?

Not at all

Perfectly

What does sigma determine in the context of Guassian noise?

Sigma is the factor that gets multiplied with the Guassian kernel, a larger sigma value will result in more visible noise in the resulting image.

How well did you know this?

Not at all

Perfectly

What does sigma determine in the context of a Guassian smoothening filter?

Sigma determines the standard deviation of the kernel, a larger sigma value will result in more blur in the resulting image.

How well did you know this?

Not at all

Perfectly

Why are non-uniform Gaussian kernels preferred over uniform kernels when smoothening images?

When using a Gaussian kernel, the center pixel and most nearby pixels will contribute the most to the average. This will result in a smoother looking average or blur.

How well did you know this?

Not at all

Perfectly

Give an example of a Gaussian kernel.

[ 1 2 1
2 4 2 * 1/16
1 2 1 ]

How well did you know this?

Not at all

Perfectly

What is a linear operator?

An operator is linear if two properties hold:

Additivity: H( f1 + f2 ) = H( f1 ) + H( f2 )
Multiplicative scaling: H( a * f1 ) = a * H( f1 )

How well did you know this?

Not at all

Perfectly

What’s the difference between cross-correlation (G = H ⊗ F) and convolution (G = H * F)?

Convolution is very similar to cross-correlation, but in convolution the kernel will get flipped by 180 degrees.

How well did you know this?

Not at all

Perfectly

Would convolution and cross-correlation produce different results when using a box or Gaussian filter?

No, the results will be identical in this case. It does however matter when you’re dealing with derivatives.

Name some mathematical properties of convolution.

Linear and shift invariant (filter behaves the same as long as the values are the same, location does not matter)
Commutative: f * g = g * f
Associative: (f * g) * h = f * (g * h)
Identity: f * e = f
Differentiation: d/dx (f * g) = df/dx * g

If you have an image that has NxN pixels and a non-separable kernel with a size of WxW, what’s the time complexity of convolution?

Now suppose the kernel is separable, what would the time complexity be?

Non-separable: O(N^2 * W^2)
Separable: O(2 * W * N^2)

Why?

Consider that H is our kernel, which gets split up in a column vector C and row vector R:

H = C * R

Instead of one convolution we could now do two:

G = H * F = (C * R) * F = C * (R * F)

Each convolution has a complexity of W * N * N. Because we do two convolutions, R * F and C * (R * F), the complexity will be:

2 * W * N^2

Imagine that you have to convolve an image using a kernel that aligns its top right cell with the top left cell of the image, how does this affect the output image compared to when the center cell of the kernel aligns with the top left cell of the image?

This is known as “full” convolution and will produce an image that is larger than the original.

“same” convolution (center cell first) results in an image of the same size as the original.

“valid” convolution (top left cell first) results in an image that is smaller than the original.

When convolving, in order to deal with the boundaries, you might have to add an edge around the image. Name some methods to deal with the boundaries.

Clip filter - make the edges black. [ leaches in black in the resulting image :/ ]
Wrap around - top left pixel is bottom right pixel and so forth. [ even worse :( ]
Copy edge - replicates the adjacent pixels. [ pretty good :) ]
Reflect - mirror the pixels adjacent. [ best method :D ]

What does the following image filter do?

[ 0 0 0
0 1 0
0 0 0 ]

It gives back the original image.

What does the following image filter do?

[ 0 0 0
0 0 1
0 0 0 ]

It gives back the image, but it is shifted to the left by one pixel.

What does the following image filter do?

[ 1 1 1
1 1 1 * 1/9
1 1 1 ]

It gives back a blurred version of the image.

What does the following image filter do?

[ 0 0 0 [ 1 1 1
0 2 0 - 1/9 * 1 1 1
0 0 0 ] 1 1 1 ]

It gives back a sharpened version of the image.

image + (image - blurred_image)

What non-linear filter could be used to remove salt and pepper noise?

A median filter could be used for this which iterates through the pixels, sorts the pixels around the current pixel and replaces the current pixel in the output image with the median.

What are the four causes of edges in images?

Edges come from rapid changes in intensity in the image, the causes are the following four: Surface normal discontinuity - changes in shape. Depth discontinuity - for example, contrast between the object and background. Illumination discontinuity - shadows. Surface color discontinuity - changes in color.

Formula for gradient direction.

Ø = arctan( df/dy / df/dx )

Formula for gradient magnitude.

|| ∇f || = sqrt ( (df/dx)^2 + (df/dy)^2 )

Define the Sobel operator.

[ -1 0 1 [ 1 2 1 - 2 0 2 * 1/8 0 0 0 * 1/8 - 1 0 1 ] -1 -2 -1 ] Sx Sy + Reduces noise the same time as it differentiates. + Center pixels contribute more.

Define the Prewitt operator.

[ -1 0 1 [ 1 1 1 - 1 0 1 0 0 0 - 1 0 1 ] -1 -1 -1 ] Sx Sy

Define Roberts operator.

[ 0 1 [ 1 0 -1 0 ] 0 -1 ] Sx Sy

Define a Laplacian operator.

[ 0 -1 0 [ -1 -1 -1 -1 4 -1 -1 8 -1 0 -1 0 ] -1 -1 -1 ] Two examples of Laplacian operators. - Sensitive to noise. + Calculates the 2nd derivative in one pass.

How does the canny edge operator work?

1. Filter the image using the derivative of a Gaussian. 2. Find the magnitude and orientation of the gradients. 3. Non maximum suppression: Thin the multi-pixel wide ridges down to a single pixel width. 4. Linking and thresholding: Define a low and high threshold, use the high threshold to start an edge and the low threshold to continue them.

What is Hough transform used for and on what principle is it built?

Hough transform is used to find lines from a set of edge pixels. It is based on the principle of voting.

What does a line in image space correspond to in Hough space?

A line in image space corresponds to a point in Hough space.

What does a point in image space correspond to in Hough space?

A point in image space corresponds to a line in Hough space. y = mx + b => b = -xm + y

What is the equation used to represent points from image space in Hough space? Why do we represent lines in this form? What is the result in Hough space?

Hesse normal form: d = x * cos(Ø) + y * sin(Ø) d: the perpendicular distance from the line to the origin. Ø: angle that the perpendicular makes with the x-axis. This is a way of dealing with infinite slopes, 0 < Ø < π. The result in Hough space is a sinusoid segment.

Define the basic Hough transform algorithm.

1. Intialize the accumulator array H[d, Ø] = 0. 2. For each edge point E(x,y) in the image: For Ø = 0 to 180: d = x * cos(Ø) - y * sin(Ø) H[d, Ø] += 1 3. Find the values (d, Ø) where H[d, Ø] is maximum. 4. The detected line in the image is given by: d = x * cos(Ø) - y * sin(Ø)

Define the principle steps of SIFT.

1. Scale space peak selection (find potential features): Blur the image a bunch of times by applying a Gaussian filter to the image with different but increasing sigma values (sigma determines the variance of the normal distribution in the kernel). The blurred images are stored in a stack and the Difference of Gaussian (DoG) between each two consecutive blurred images (subtract them from each other). DoG is a fast approximation of the NLoG operator (Normalized Laplacian of Guassian). Extract the extrema by sliding a 3x3x3 window over the DoGs and store the largest value in the window. 2. Key point localization (get rid of the weak extrema): Remove the extrema that don't meet a chosen threshold to obtain the actual feature points. 3. Orientation assignment: Calculate the X and Y gradient using Sobel or a similar method. Once the gradients are calculated, the angle of each pixel in the features can be calculated using: v = arctan( dI/dy / dI/dx ). Each feature can be assigned a principle orientation, by selecting the most frequent angle in the feature. 4. Key point descriptor (create a signature of the feature): Create signatures of the points by splitting the feature up in four quadrants, create a frequency array of the angles for each quadrant and concatenate the arrays. This big multidimensional is the descriptor that is used in matching.