Part 1 Flashcards

Question

What are the four essential building blocks of Convolutional Neural Networks?

Answer 1

- convolutional layer - activation function - pooling layer/ subsampling layer - Fully connected layer

Answer 2

→ to detect local connectivity of features from the previous layers Sliding a set of filters (or kernels) across the input image. Each filter is responsible for learning some local feature within the image FEATURE LEARNING --> produce a feature map

Answer 3

provide nonlinearity to learn complex patterns

Answer 4

→ Compress and aggregate information across spatial location - save parameters, save computation, reduce overfitting - Downsampled feature maps

Answer 5

-traditionally used at the end of CNNs for classification tasks. -connect every neuron to every neuron in the previous and subsequent layers, allowing the network to make predictions based on the high-level features learned in earlier layers.

Answer 6

convolving the output gradients with the flipped filter (horizontally and vertically)

Answer 7

- Bottleneck - layer - to flatten or merge channels so that the size of the network decreases --> fewer parameters, save computations - It also reduces overfitting

Answer 8

1. 1x1 convolutions simply calculate inner products at each position 2. Simple and efficient method to decrease the size of a network 3. ** Learns dimensionality reduction, e.g., can reduce redundancy in your feature Maps

Answer 9

replace the last layer that fully connected with flatten + 1x1 or NxN convolution + global average pooling

Answer 10

balance btw bias and variance in the performance of a ML model -bias: error -variance: model's sensitivity to fluctuations in the training data -Simultaneously optimizing bias and variance is impossible in general

Answer 11

Low Bias, High Variance: -overly complex, capture noise - overfitting: well on training, poor on new unseen data High Bias, low variance: -too simple, underfitting -perform poorly on both training + test data Balance/sensible -capturing the underlying patterns without being too influenced by noise. -It generalizes well to new, unseen data.

Answer 12

the capacity of a model describes a variety of functions it can approximate. related to nr. parameters

Answer 13

-Start with a small training data set --> suffer high test loss, low training loss, overfitting -Optimize variance by using more training data --> higher model capacity but match the size of the training set -More training data --> reduce test loss -Too high model capacity --> bad overfitting

Answer 14

Increase model capacity → decrease training + test loss Up to overfitting point → start to produce bad overfitting: test loss increases --> Increase in bias for a decrease in variance

Answer 15

1. augment data 2. adapt architecture 3. adapt the training process 4. preprocessing 5. regularizer (in loss function) 6. dropout 7. use a validation set + use parameters with minimum validation loss as an early stop.

Answer 16

ensure: Every transformation to which the label should be invariant e.g. rotating transformation 1. random spatial transform 2. pixel transformation (change resolution, random noise, pixel distribution)

Answer 17

add a penalty term to the loss function

Answer 18

- enforce small norm: L2 norm - enforce sparsity: L1 norm

Answer 19

- L1 norm: +different shrinkage of weights +many weights are 0 esp. when lamda is large i.e. sparse - L2 norm: +weight decay (shrinkage) +small weight, more spread out or diffuse weight vectors

Answer 20

standardize the range of independent variables or features of data --> prevent domination

Answer 21

- use training data ONLY - normalization of input data - normalization within the network

Answer 22

-min/max -z-score / variance normalization - zero-centering / mean subtraction - Batch normalization

Answer 23

- reduce Internal Covariate Shift i.e. It normalizes the distribution of the input for the layer that follows the Batch normalization layer - improve stability

Answer 24

refers to the change in the distribution of network activations caused by adjustments in network parameters during training

Answer 25

1. ReLU is not zero-centered 2. initialization and input distribution might not be normalized 3. deeper nets --> amplified effect

Answer 26

- method to address the stability problem of SGD - SeLU + specific weight initialization - special form of dropout --> stable activations, stable training

Answer 27

- reduce co-adaptation (different neurons in the network become highly dependent on each other during training --> less robust model) --> independent features

Answer 28

randomly drops/deactivates a fraction of neurons in the network at each update cycle i.e. ignored during the forward pass How: -set activations to 0 with (1-p) --> need compensation for dropout effect i.e. test time: multiply activation with p -drop connect (less efficient implementation)

Answer 29

- does not matter i.e. always reach the global minimum - bad initialization--> slow convergence, more computational resources

Answer 30

- does matter - NN with non-linearity is general non-convex

Answer 31

- simply initialized to 0 - when using ReLU, a small positive constant e.g. 0.1 is better due to dying ReLU issue

Answer 32

- random - bad to initialize with 0 - small uniform / Gaussian e.g. uniform random in the range [0,1]

Answer 33

-calibrate the variances for the forward pass by initializing with a zero-mean Gaussian -takes the number of input features into account

Answer 34

effective when used with activation functions that have a mean close to zero e.g. ReLU - scale the weights by a factor that takes the number of input features into account --> helps keep the variance of the activations roughly the same across different layers.

Answer 35

reuse models / use a pre-trained model on a new problem - for a different task on the same data - on different data for the same task - on different data for a different task

Answer 36

- weight transfer (diff task) e.g. pre-trained model --> image classification --> target model: object detection/ segmentation - transfer between modalities (diff data type)

Answer 37

- weight transfer: + Capitalizes on features learned by the source model, + speeding up training on the target task. - transfer between modalities +benefit from the representation learning achieved in one modality when training on a different modality. +Useful when labeled data is scarce in the target modality.

Answer 38

train a network simultaneously on multiple related tasks

Answer 39

- several hidden layers are shared between all tasks (usually feature extraction layer) - MTL of N tasks --> reduce chance of overfitting by an order of N

Answer 40

Each model has its parameters Instead of forcing equality, the distance between parameters is regularized as part of the loss function Options e.g. l2-norm, trace-norm, ...

Answer 41

- to create a more stable network - additional tasks to the original task e.g. facial landmark detection + learning subtly related tasks e.g. face pose, smile/not smile, glasses/no glasses, gender

Part 1 Flashcards

(66 cards)