09_convolutional neural networks and computer vision Flashcards

1
Q

Why are some layers called “hidden layers”?

A

because it’s difficult to tell what happens in them exactly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What happens in different stages in the layers of a fully connected neural network?

A

input layer

1) feature extraction

2) classification

output layer

–> end-to-end learning (no feature engineering needed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens if we linearize/vectorize an input?

A

the network is unable to leverage spatial information from a linearized image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are convolutions?

A

a mathematical operation that operate on dense data (such as images)

we consider an image I and a “Kernel” K. the “convolution” of I and K will result
in an output feature map S

S is constrained by the size of the inputs and the kernel

–> convolutional kernels acts as feature filters and are sometimes called “filters” for this exact reason
(provide high output signals where the projections of the kernel resembles the input pattern)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the sobel filter?

A

is constructed to extract edges
(only cares about the light changes that indicate edges)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

On what parameters are convolutional layers based on?

A
  • number of input channels
  • number of output channels

(affect output feature map size)
- kernel size
- stride
- padding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a channel?

A

a two-dimensional feature map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is multi-channel data?

A

stacks of feature maps
(eg with different colors)

we use one kernel per channel (don’t have to be identical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the kernel size define?

A

the size of the kernel operator - affects the size of the output of the convolution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we calculate the output feature map in regards to the kernel size?

A

O = I - K + 1

O = output feature map size
K = kernel size
I = input feature map size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the receptive field of a kernel?

A

depends on their size and the features of different sizes they are therefore able to see

–> large kernels are able to identify larger and more complex features such as an eye: they have more CONTEXT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a stride in CNNs?

A

step size of a kernel
(default is 1)

If we choose a stride >1, it reduces the size of the output feature map

–> O = (I - K) / S + 1
S= stride

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why do we need padding and what is it?

A

convolutions shrink feature maps, padding counter-acts this

–> padding input map with rows and columns of zeros

O = (I - K + 2P) / S + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a huge advantage of convolutional layers regarding parameters?

A

because the kernel parameters are learned by the model itself and shared over the entire input feature map,
the number of parameters of the network is vastly reduced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How many parameters does a convolutional layer have?

A

K^2 * C(in) * C(out)

the number of parameters is therefore independent of the input feature map size as opposed to linear layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the combination of convolutional layers lead to?

A

a large receptive field: info that is seen by a single element in the output feature map covers a much larger area in previous feature maps

17
Q

How is a CNN for image classification build?

A

Input

1) convolutional layers = feature extraction

(linearization in between)

2) linear layers = classifier

output

(non-linearities after each layer)

–> decreasing feature map size: decreasing resolution
–> increasing number of channels: increasing semantic information

18
Q

How to implement a segmentation model?

A

1) replace linear layers with convolutional layers (output must be feature map)

2) adjust convolutions so that final output has the same size as input image (eg through padding)

3) apply SOFTMAX after final layer to get class probability for each pixel

19
Q

Why do we use U-Nets for segmentation models?

A

it is very expensive if we have large size of feature map after each layer

20
Q

£What does the U-Net do?

A

first downsampling - encode
then upsampling - decode

downsampling of feature map greatly reduces the feature map extents and improves the ability to generalize

–> much better performance with skip connections

21
Q

What are skip connections in U-Net?

A

supplement the semantically rich information from the layers near the bottleneck with high-resolution feature maps from the downsampling branch

22
Q

How do we do upsampling?

A

transported convolutions: using a stride <1 we can use convolutions as learnable upscaling operations

23
Q

What is panoptic segmentation?

A

Combi of semantic segmentation and instance segmentation

24
Q

What is object detection for images?

A

goal: find instances of a given class in an image (provide class, approximate location and extent (bounding box))

25
Q

What are complications with object detection? (5)

A
  • scale variability
  • object might be occulted
  • lighting and shadows
  • perspective affects the appearance of an object
  • how do we define the location of the object?
26
Q

What are examples of multi-stage object detection?

A

region-based CNNs
- object detection process is split into different steps that require to process the image data multiple times

fast R-CNN
- more efficient

faster R-CNN
- proposal regions are extracted from CNN feature map

27
Q

What is single-stage object detection?

A

YOLO - you only look once