09_convolutional neural networks and computer vision Flashcards by Annina Vietze

Why are some layers called “hidden layers”?

because it’s difficult to tell what happens in them exactly

How well did you know this?

Not at all

Perfectly

What happens in different stages in the layers of a fully connected neural network?

input layer

1) feature extraction

2) classification

output layer

–> end-to-end learning (no feature engineering needed)

How well did you know this?

Not at all

Perfectly

What happens if we linearize/vectorize an input?

the network is unable to leverage spatial information from a linearized image

How well did you know this?

Not at all

Perfectly

What are convolutions?

a mathematical operation that operate on dense data (such as images)

we consider an image I and a “Kernel” K. the “convolution” of I and K will result
in an output feature map S

S is constrained by the size of the inputs and the kernel

–> convolutional kernels acts as feature filters and are sometimes called “filters” for this exact reason
(provide high output signals where the projections of the kernel resembles the input pattern)

How well did you know this?

Not at all

Perfectly

What is the sobel filter?

is constructed to extract edges
(only cares about the light changes that indicate edges)

How well did you know this?

Not at all

Perfectly

On what parameters are convolutional layers based on?

number of input channels
number of output channels

(affect output feature map size)
- kernel size
- stride
- padding

How well did you know this?

Not at all

Perfectly

What is a channel?

a two-dimensional feature map

How well did you know this?

Not at all

Perfectly

What is multi-channel data?

stacks of feature maps
(eg with different colors)

we use one kernel per channel (don’t have to be identical)

How well did you know this?

Not at all

Perfectly

What does the kernel size define?

the size of the kernel operator - affects the size of the output of the convolution

How well did you know this?

Not at all

Perfectly

How can we calculate the output feature map in regards to the kernel size?

O = I - K + 1

O = output feature map size
K = kernel size
I = input feature map size

How well did you know this?

Not at all

Perfectly

What is the receptive field of a kernel?

depends on their size and the features of different sizes they are therefore able to see

–> large kernels are able to identify larger and more complex features such as an eye: they have more CONTEXT

How well did you know this?

Not at all

Perfectly

What is a stride in CNNs?

step size of a kernel
(default is 1)

If we choose a stride >1, it reduces the size of the output feature map

–> O = (I - K) / S + 1
S= stride

How well did you know this?

Not at all

Perfectly

Why do we need padding and what is it?

convolutions shrink feature maps, padding counter-acts this

–> padding input map with rows and columns of zeros

O = (I - K + 2P) / S + 1

How well did you know this?

Not at all

Perfectly

What is a huge advantage of convolutional layers regarding parameters?

because the kernel parameters are learned by the model itself and shared over the entire input feature map,
the number of parameters of the network is vastly reduced

How well did you know this?

Not at all

Perfectly

How many parameters does a convolutional layer have?

K^2 * C(in) * C(out)

the number of parameters is therefore independent of the input feature map size as opposed to linear layers

How well did you know this?

Not at all

Perfectly

What does the combination of convolutional layers lead to?

Study These Flashcards

a large receptive field: info that is seen by a single element in the output feature map covers a much larger area in previous feature maps

How is a CNN for image classification build?

Study These Flashcards

Input

1) convolutional layers = feature extraction

(linearization in between)

2) linear layers = classifier

output

(non-linearities after each layer)

–> decreasing feature map size: decreasing resolution
–> increasing number of channels: increasing semantic information

How to implement a segmentation model?

Study These Flashcards

1) replace linear layers with convolutional layers (output must be feature map)

2) adjust convolutions so that final output has the same size as input image (eg through padding)

3) apply SOFTMAX after final layer to get class probability for each pixel

Why do we use U-Nets for segmentation models?

Study These Flashcards

it is very expensive if we have large size of feature map after each layer

£What does the U-Net do?

Study These Flashcards

first downsampling - encode
then upsampling - decode

downsampling of feature map greatly reduces the feature map extents and improves the ability to generalize

–> much better performance with skip connections

What are skip connections in U-Net?

Study These Flashcards

supplement the semantically rich information from the layers near the bottleneck with high-resolution feature maps from the downsampling branch

How do we do upsampling?

Study These Flashcards

transported convolutions: using a stride <1 we can use convolutions as learnable upscaling operations

What is panoptic segmentation?

Study These Flashcards

Combi of semantic segmentation and instance segmentation

What is object detection for images?

Study These Flashcards

goal: find instances of a given class in an image (provide class, approximate location and extent (bounding box))

What are complications with object detection? (5)

- scale variability - object might be occulted - lighting and shadows - perspective affects the appearance of an object - how do we define the location of the object?

What are examples of multi-stage object detection?

region-based CNNs - object detection process is split into different steps that require to process the image data multiple times fast R-CNN - more efficient faster R-CNN - proposal regions are extracted from CNN feature map

What is single-stage object detection?

YOLO - you only look once

09_convolutional neural networks and computer vision Flashcards

(27 cards)