Quiz 5 - Module 4 Flashcards

1
Q

GANs involve __ density modeling

A

implicit

  • generate samples from the model p(x)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

GAN input

A
  • Generator
    • Vector of random numbers, normal (mu, sigma)
  • Discriminator
    • minibatch
      • p(x) fake image
      • real image
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Gan output

A
  • Discriminator
    • real or fake
  • Generator
    • p(x)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Generator role

A
  • update weights to improve realism of generated images
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discriminator role

A
  • update weights to better discrimate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Game theory problem for GANs

A
  • Mini-max Two Player Game
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

GAN Objective

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

GAN Generator Objective

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

GAN Discriminator Objective

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The ___ part of the GAN objective does not have good gradient properties

A

Generator

  • High gradient when D(G(z)) is high (ie. discriminator is wrong)
  • We want to improve when samples are bad
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Alternate Objective for GAN Max-Max Game

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

GAN Drawbacks

A
  • No explicit model for distribution
  • training can be unstable
  • High-fidelity generation heavy to train
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

VAE involve __ density modeling

A

explicit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

VAE input

A
  • Encoder
    • Input is image X
  • Decoder
    • sample Z from simple distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

VAE Output

A
  • Encoder
    • Parameters of a probability distribution (Z)
      • mu and sigma
  • Decoder
    • Parameters of a probability distribution
      • Mu and sigma of Gaussian
      • For multi-dimensional version, output diagonal covariance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

VAE Optimization

A
  • Two parts
    • KL Divergence
      • Variational lower bound (elbo)
    • Reconstruction Loss
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

T/F: Variational AutoEncoders are differential

A

True - with caveat

  • Sampling action is not differentiable (stochastic)
  • Need to use reparameterization trick to put stochastic sampling into a separate variable (epsilon) that is not in backprop.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

VAE Reconstruction Loss

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

VAE Distribution Loss

A

The loss associated with the VAE Distribution diverging from the normal distribution (mu = 0, sigma = 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Gan Discriminator wants ___ (minimize/maximize)

E[log D(x)] + E[log (1 - D(G(z)))]

A

maximize

  • Discriminator wants to output a 0 for D(G(z)) to indicate that the generated image is fake (0) not real (1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Gan Generator wants _____ (minimize/maximize)

E[log D(x)] + E[log (1 - D(G(z)))]

A

minimize

  • The generator wants the discriminator to be wrong
  • Ie. wants the discriminator to classify D(G(z)) as 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The ___ part of the objective for GAN does not have good

A

Generator

  • High gradient when D(G(z)) is high (discriminator, wrong)
  • We want it to improve when samples are bad (discriminator is right)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Semi-supervised learning data type

A
  • Small amount of labeled data
  • Larger amount of unlabeled data
24
Q

Different ideas for training in semi-supervised environment

A
  • simple idea
    • learn model on small label data
    • make predictions on unlabeled data, add as new training, repeat
  • co-training
    • prediction across multiple views
25
Fixed match (pseudo-labeling)
* Unlabeled data example * Weakly augment * Make prediction, generate pseudo-label * throw out cases below threshold * Strongly augment * Make prediction, use pseudo-label as ground truth
26
Pseudo-labeling (in practice)
* Labeled examples (feed directly to model) * Unlabeled examples * Combined * Weakly augment * Strongly augment * Losses * Cross Entropy (Labeled Data) * Cross Entropy (strongly augmented unlabeled data using weakly augmented pseudo-label as ground truth)
27
Label propagation
Learn feature extractors and apply to unlabeled data. Get unlabeled data labeled similarly to labeled data in clusters (like KNN)
28
Few shot learning data
* Base set of data * lots of labels * New set * very few labels (1 - 5 examples per category) in new categories * transfer learning
29
Approaches to few shot learning
* Fine-tuning * train classifier on base classes * freeze feature extractor * learn classifier for new classes (during "query" time) * Simulate (N-Way K-Shot Tasks) * Meta-training * Better at making train reflect what will happen during test
30
Classifier useful in the few-shot fine-tuning case
* cosine (similarity-based) * instead of linear layer * unit comparison (A.B) / (norm (A) norm(B) ) * normalized (unit-norm) comparison may discriminate a small number of classes better since it focuses on an angular difference
31
Meta-Training
* useful for few-shot learning * makes training better reflect test (simulate smaller tasks) * N-Way K-Shot Tasks * N - number of categories * K - examples per category * Can pre-train features on held-out base classes
32
Meta-Learner methods
* Meta-Learner LSTM * want to learn gradient descent * update rules * param initialization * adaptive LR, weight decay to reduce overfit * gradient descent update looks like LSTM update * Model-agnostic meta learning (MAML) * want to learn parameter initialization * normal gradient descent
33
self-supervised data
* no labels at all
34
Autoencoders
* Low dimensional embedding between an encoder and a decoder * Loss * Minimize difference (MSE)
35
Surrogate tasks for self-supervised learning
* reconstruction * rotate image * colorization * relative image patch location (jigsaw) * video: next frame prediction * instance prediction
36
Colorization
* self-supervised task * input * grayscale * output * color * loss * MSE
37
jigsaw puzzle
* self-supervising task * input: image patches * output: prediction of discrete image pach location relative to center * loss: cross-entropy classification (which position)
38
rotation prediction
* input: image with various rotations * output: prediction rotation amount * objective: cross-entropy classification
39
Evaluation of self-supervised learning
* train the model with surrogate task * extrack the convnet (encoder part) * transfer to actual task * use to initiaize model of another supervised learning task * use it to extract featrures for learning a separate classifier (NN, SVM) * often classifier is limited to linear layer and features are frozen
40
Instance discrimination
* Positive Example * 2 augmentations (same image) * Negative Example * Augmented * Feed positive and negative examples to classifier CNN * Loss * contrastive loss * dot product (similarity) between augmentation 1 and positive and negative examples
41
Contrastive loss types
* end-to-end * use all other examples as negatives in mini batch * memory bank * store negatives across iterations (Queue) * from previous mini-batches * don't have to redo feature extraction * no extra feature extraction needed (stored) * Momentum encoder * exponential average of moving weights * helps avoid stale weights issue in memory bank
42
Reinforcement Learning
Sequential decision making in an environment with evaluative feedback
43
Signature challenges in reinforcement learning
* evaluative feedback * need trial/error to find the right action * delayed feedback * actions may not lead to immediate reward * non-stationary * data distribution of visisted states changes when the policy changes * fleeting nature of time and online data
44
Markov decision process
(S, A, R, T, gamma) * state * action * distribution of rewards R(s, a, s') * transition probability T(s, a, s') * gamma discount factor
45
Markov property
Current state completely characterizes state of the environment. Assume most recent observation is a sufficient statistic of history
46
What do we assume is unknown about an MDP in RL?
* Transition probability distribution * Reward distribution
47
Value Iteration
48
Bellman Optimality Equation (value)
49
Q-Iteration is the same as value iteration except \_\_\_
it loops over actions as well as states
50
Parts of policy iteration
* Policy Evaluation * Compute V\_pi (similar to value iteration) * Policy Refinement * Greedily change actions as per V\_pi at next steps
51
Why choose policy iteraton over value iteration?
Pi often converges to Pi\* much sooner than V to V\*
52
Deep Q-Learning
* Parameterized Q-function from data {(s, a, s', r)} for N data points * Linear function approximators * Q(s, a; w, b) = waTs + ba * Loss * MSE * (Qnew(s,a) - (r + gamma \* maxaQold(s', a)) )2 * Qnew - predicted Q-value * Qold - target Q-value * For stability * Freeze Qold and update Qnew parameters * Set Qold to Qnew at regular intervals
53
Deep Q-Learning - Correlated Data Problem
* Samples are correlated -\> high variance gradients -\> inefficient learning * Current Q-Network parameters determines next training sample -\> can lead to bad feedback loops * Resolution? * replay buffer that stores transitions * update replay buffer as game (Experience) epuisodes are played, older samples discarded * Train Q-network on random minibatches of transitions from the replay memory, instead of consecutive samples * large the buffer, lower the correlation
54
What are the key steps of the Deep Q-Learning Algorithm
* Epsilon greedy action selection * select random action with probability epsilon or max Q\* action * Experience replay * store transition in replay buffer * sample random minibatch of transitions from buffer * Q update * perform gradient descent
55
Derive the policy gradient
56