Mentimeters Flashcards
A CNN filter is applied to
all channels across layer
Stride is
step with which the filter is applied
Padding
increases the size of input data
Pooling
- combines feature values within a region
- downsamples feature maps
CNN activation is applied to
channel
Hyperparameters can be learned with
validation
FC layer is typically used
close to the output side of the network
Typical loss for multiclass classification
- cross entropy
- softmax
- negative log likelihood
ReLu can be applied
before or after max-pooling
Learning rate is
step of weights update
Weights are not updated once per
epoch
All training data is used to update weights in one
epoch
Averaging updates over iterations is called
momentum
first and second order moments of gradients and used in
- adadelta
- RMSProp
- Adma
- Adagrad
Batch normalisation is applied to
channels
Dropout is an effective regularisation of
fully connected layers
L2 regularisation of weights is called
decay
finetuning is a process of
updating parameters pretrained on another dataset
data augmentation consists of
generating new samples from existing ones
hard negative is a
negative example which is similar to a positive one
hard positive is a
positive sample which is dissimilar to positive ones
to debug a model
overfit on a small dataset
bias in a dataset is
confusing noise introduced during data collection
VGG uses
3x3 filters and max pooling
VGG is widely used because of its
effective feature representation
efficiency of 1x1 filters was exploited in
inception
inception block uses
parallel filters with concatenated outputs
skip connections are used in
ResNet
skip connections in ResNet
do not change data
best performing word embedding is
Bert
Which unit is least effective in remembering sequences
RNN
Gating mechanism uses
sigmoid
in GRU hidden state and input are
concatenated
Language modelling uses architecture type
many to many
Transformers self attention uses
linear projections
What is the goal of reinforcement learning?
maximise expected return
The discount factor in value function is used to
weigh immediate and future rewards
which behaviour is exploration in game exploring
play an experimental move
what is the main drawback to monte carlo sampling approach to RL
needs to run an entire episode before updating
what is the main different between Q learning and SARAR
SARAR uses epsilon greedy update policy
main problem of Q-learning
not scalable
in deep Q learning ‘deep’ is mainly used to
approximate Q function
in policy based methods do we select actions according to value function
no
policy optimisation can only be performed using gradient based methods
false
REINFORCE is based on
monte carlo
in REINFORCE with baseline the baseline is used to
reduce variance
which method is not designed to reduce variance
REINFORCE
in actor-critic methods, critic is similar to which part of a GAN
discriminator
compared to value-based methods, policy-based methods can handle continuous action easily?
true
which parameters are not hyperparameters
weights of convolutional kernel
which hyperparameters optimization method is more efficient
random search
in successive halving, the number of configurations n indicates
exploration
in meta learning only training tasks contain training set and test set
false
in meta learning total loss is computed using
test examples
meta learning and multi task learning are the same
false
which colour representation can be used to compute colour similarities
RGB colour space
unsupervised representation learning can’t be used for
learning a mapping function from dataset and labels
autoencoder is an
unsupervised method
in autoencoder the decoder must be symmetric to the encoder
false
as long as an autoencoder can reconstruct the input, this autoencoder can learn useful representations of the input
false
what objective function is used to train autoencoder
reconstruction loss
which is not an autoencoder?
disruptive
which autoendcoder can be used to perform dimensionality reduction
undercomplete autoencoders
in autoencoders which technique is used for anomaly detection
reconstruction error
which autoencoder should be used to recover noisy data
denoising autoencoder
an image classification model is a
discriminative model
VAEs
explicit methods
how are VAEs trained
maximising likelihood
the reparameterisation trick in VAEs is used for
training
GANs are
implicit methods
which loss is better for training the generator of GANs
non-saturating heuristic
what do GANs and VAEs have in common
both are generative models
Are VAEs easier to train but generate less sharp images?
yes