Quiz 3 Flashcards

1
Q

Optimization error

A

Even if your neural network can perfectly model the real world, your optimization algorithm may not be able to find the weights that model that function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Estimation error

A

Even if we do find the best hypothesis, this best set of weights or parameters for the neural network, that doesn’t mean that you will be able to generalize to the testing set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Modeling error

A

Given a particular neural network architecture, your actual model that represents the real world might not be in that space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Optimization error scenario

A

You were lazy and couldn’t/didn’t optimize to completion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Estimation error scenario

A

You tried to learn model with finite data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Modeling error scenario

A

You approximated (didn’t?) reality with model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

More complex models lead to ______ modeling error

A

Smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Transfer learning steps

A

1) Train on large-scale dataset
2) Take your custom data and initialize the network with weights trained in Step 1
3) Continue training on new dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ways to apply transfer learning

A

Finetune, freeze

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Finetune

A

Update all parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Freeze (feature layer)

A

Update only last layer weights (used when not enough data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When is transfer learning less effective?

A

If the source dataset you train on is very different from the target dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

As we add more data, what do we see in generalization error?

A

Error continues to get smaller / accuracy continues to improve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

LeNet architecture

A

two sets of convolutional, activation, and pooling layers, followed by a fully-connected layer, activation, another fully-connected, and finally a softmax classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

LeNet architecture shorthand

A

INPUT => CONV => RELU => POOL => CONV => RELU => POOL => FC => RELU => FC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

LeNet architecture good for

A

Number classification / MNIST dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

AlexNet architecture

A

Eight layers with learnable parameters. The model consists of five layers with a combination of max pooling followed by 3 fully connected layers and they use Relu activation in each of these layers except the output layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

AlexNet activation function

A

ReLU instead of sigmoid or tanh. The first to do so

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

AlexNet data augmentation

A

PCA-based (principle component analysis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

AlexNet regularization

A

Dropout (the first to use)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

AlexNet used ______ to combine predictions from multiple networks

A

Ensembling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

VGGNet used ____ modules / blocks of layers

A

repeated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

All convolutions in VGG are ___

A

3 x 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Most memory usage in VGG is in

A

Convolution layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Most of the parameters in VGG are

A

In the fully connected layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

VGG max pooling

A

2x2, stride 2

27
Q

Number of parameters in VGG

A

Hundreds of millions

28
Q

Number of parameters in AlexNet

A

60 million

29
Q

Key idea behind Inception

A

Repeated blocks and multi-scale features

30
Q

Inception uses _____ filters

A

parallel

31
Q

Residual neural networks are easy to ___

A

optimize

32
Q

Equivariance

A

if input undergoes a transformation, the output will also undergo the same transformation that is T(x)→T(a).

33
Q

Invariance

A

if an input undergoes a transformation, the output is unchanged. That is T(x)→a.

34
Q

We want equivariance in ________ layers

A

intermediate / convolutional

35
Q

We want invariance in _____ layers

A

output (also in rotation)

36
Q

Convolution is ______ Equivariant

A

Translation

37
Q

Max pooling is invariant to ________

A

Permutation

38
Q

Style transfer

A

1) Take first image, compute the features
2) Take the generated image, starting with a zero or random image, and also compute the features
3) Take a style image and compute those features
4) Change generated image to minimize both losses at the same time

39
Q

Gram matrix

A

Represents feature correlations across different layers in the neural network

40
Q

How Gram matrix works

A

1) Take a particular layer in a CNN
2) Take a pair of channels within the feature map
3) Compute the correlation, or dot product, between the two feature maps
4)

41
Q

Loss function in style transfer

A

Minimize the squared difference between the gram matrices (the Gram matrix of style/Gram matrix of original image and the Gram matrix of the generated image) –> this results in two losses. Total loss is the two losses with some weighting.

42
Q

Model with well-calibrated predictions

A

Logistic regression

43
Q

Model with poorly calibrated predictions (overconfident)

A

ResNet

44
Q

Group calibration

A

The scores for subgroups of interest are calibrated or equally miscalibrated

45
Q

A classifier is well-calibrated if

A

The probability of the observations with a given probability score of having a label is equal to the proportion of observations having that label

46
Q

Platt scaling requires

A

An additional validation dataset

47
Q

Platt scaling

A

Learn parameters a, b so that the calibrated probability is sigmoid(az + b) where z is a parameter and b is a constant

48
Q

Difference between Platt scaling and temperature scaling

A

Temperature scaling applies Platt scaling to multi-class classification using softmax

49
Q

Limitations of calibration

A

Group based (what characteristic denotes the groups?), the inherent tradeoffs on calibration

50
Q

The Fairness Impossibility Theorems

A

It is impossible for a classifier to achieve both equal calibration and error rates between groups, if there is a difference in prevalence between the groups and the classifier is not perfect

51
Q

Positive Predictive Value

A

PPV = TP/(TP + FP)

52
Q

What does an impossibility theorem obtain?

A

For any three (or more) measures of model performance derived from the confusion matrix, in a system of equations with three more equations, p is determined uniquely: if groups have different prevalences, these quantities cannot be equal

53
Q

Transposed convolution

A

Take each input pixel, multiply by learnable kernel, “stamp” it on input

54
Q

Large sensitivity of loss implies what

A

Important pixels

55
Q

What do you have to find in saliency maps?

A

The gradient of classifier scores (pre-softmax). Take absolute value of gradient and sum across all channels

56
Q

What gets zeroed out in guided backprop?

A

Negative gradients (we only pass back the positive gradients) for forward and backwards pass

57
Q

Gradient ascent

A

Compute the gradient of the score for a particular class that we care about with respect to the input image. Rather than subtracting the learning rate times the gradient, we’ll add the learning rate times the gradient

58
Q

Defenses against adversarial attacks

A

Training with adversarial examples, perturbations, noise, or re-encoding of attacks

59
Q

Cross entropy

A

Easy examples incur a non-negligible loss, which in aggregate mask out the harder, rare examples

60
Q

Focal loss

A

Down weights easy examples to give more attention to difficult examples

61
Q

Focal loss formula

A

FL(p) = -(1 - p) y log(p)

62
Q

Focal loss is used to

A

Address the issue of the class imbalance problem

63
Q

Receptive field defines

A

What set of input pixels in the original image affect the value of this node or activation deep inside the network

64
Q

As you get deeper into the neural network, the receptive field

A

continues to increase over and over