Quiz 5 - Module 4 Flashcards
GANs involve __ density modeling
implicit
- generate samples from the model p(x)
GAN input
- Generator
- Vector of random numbers, normal (mu, sigma)
- Discriminator
- minibatch
- p(x) fake image
- real image
- minibatch
Gan output
- Discriminator
- real or fake
- Generator
- p(x)
Generator role
- update weights to improve realism of generated images
Discriminator role
- update weights to better discrimate
Game theory problem for GANs
- Mini-max Two Player Game
GAN Objective
GAN Generator Objective
GAN Discriminator Objective
The ___ part of the GAN objective does not have good gradient properties
Generator
- High gradient when D(G(z)) is high (ie. discriminator is wrong)
- We want to improve when samples are bad
Alternate Objective for GAN Max-Max Game
GAN Drawbacks
- No explicit model for distribution
- training can be unstable
- High-fidelity generation heavy to train
VAE involve __ density modeling
explicit
VAE input
- Encoder
- Input is image X
- Decoder
- sample Z from simple distribution
VAE Output
- Encoder
- Parameters of a probability distribution (Z)
- mu and sigma
- Parameters of a probability distribution (Z)
- Decoder
- Parameters of a probability distribution
- Mu and sigma of Gaussian
- For multi-dimensional version, output diagonal covariance
- Parameters of a probability distribution
VAE Optimization
- Two parts
- KL Divergence
- Variational lower bound (elbo)
- Reconstruction Loss
- KL Divergence
T/F: Variational AutoEncoders are differential
True - with caveat
- Sampling action is not differentiable (stochastic)
- Need to use reparameterization trick to put stochastic sampling into a separate variable (epsilon) that is not in backprop.
VAE Reconstruction Loss
VAE Distribution Loss
The loss associated with the VAE Distribution diverging from the normal distribution (mu = 0, sigma = 1)
Gan Discriminator wants ___ (minimize/maximize)
E[log D(x)] + E[log (1 - D(G(z)))]
maximize
- Discriminator wants to output a 0 for D(G(z)) to indicate that the generated image is fake (0) not real (1)
Gan Generator wants _____ (minimize/maximize)
E[log D(x)] + E[log (1 - D(G(z)))]
minimize
- The generator wants the discriminator to be wrong
- Ie. wants the discriminator to classify D(G(z)) as 1
The ___ part of the objective for GAN does not have good
Generator
- High gradient when D(G(z)) is high (discriminator, wrong)
- We want it to improve when samples are bad (discriminator is right)
Semi-supervised learning data type
- Small amount of labeled data
- Larger amount of unlabeled data
Different ideas for training in semi-supervised environment
- simple idea
- learn model on small label data
- make predictions on unlabeled data, add as new training, repeat
- co-training
- prediction across multiple views
Fixed match (pseudo-labeling)
- Unlabeled data example
- Weakly augment
- Make prediction, generate pseudo-label
- throw out cases below threshold
- Strongly augment
- Make prediction, use pseudo-label as ground truth
- Weakly augment
Pseudo-labeling (in practice)
- Labeled examples (feed directly to model)
- Unlabeled examples
- Combined
- Weakly augment
- Strongly augment
- Combined
- Losses
- Cross Entropy (Labeled Data)
- Cross Entropy (strongly augmented unlabeled data using weakly augmented pseudo-label as ground truth)
Label propagation
Learn feature extractors and apply to unlabeled data. Get unlabeled data labeled similarly to labeled data in clusters (like KNN)
Few shot learning data
- Base set of data
- lots of labels
- New set
- very few labels (1 - 5 examples per category) in new categories
- transfer learning
Approaches to few shot learning
- Fine-tuning
- train classifier on base classes
- freeze feature extractor
- learn classifier for new classes (during “query” time)
- Simulate (N-Way K-Shot Tasks)
- Meta-training
- Better at making train reflect what will happen during test
Classifier useful in the few-shot fine-tuning case
- cosine (similarity-based)
- instead of linear layer
- unit comparison (A.B) / (norm (A) norm(B) )
- normalized (unit-norm) comparison may discriminate a small number of classes better since it focuses on an angular difference
Meta-Training
- useful for few-shot learning
- makes training better reflect test (simulate smaller tasks)
- N-Way K-Shot Tasks
- N - number of categories
- K - examples per category
- Can pre-train features on held-out base classes
Meta-Learner methods
- Meta-Learner LSTM
- want to learn gradient descent
- update rules
- param initialization
- adaptive LR, weight decay to reduce overfit
- gradient descent update looks like LSTM update
- want to learn gradient descent
- Model-agnostic meta learning (MAML)
- want to learn parameter initialization
- normal gradient descent
self-supervised data
- no labels at all
Autoencoders
- Low dimensional embedding between an encoder and a decoder
- Loss
- Minimize difference (MSE)
Surrogate tasks for self-supervised learning
- reconstruction
- rotate image
- colorization
- relative image patch location (jigsaw)
- video: next frame prediction
- instance prediction
Colorization
- self-supervised task
- input
- grayscale
- output
- color
- loss
- MSE
jigsaw puzzle
- self-supervising task
- input: image patches
- output: prediction of discrete image pach location relative to center
- loss: cross-entropy classification (which position)
rotation prediction
- input: image with various rotations
- output: prediction rotation amount
- objective: cross-entropy classification
Evaluation of self-supervised learning
- train the model with surrogate task
- extrack the convnet (encoder part)
- transfer to actual task
- use to initiaize model of another supervised learning task
- use it to extract featrures for learning a separate classifier (NN, SVM)
- often classifier is limited to linear layer and features are frozen
Instance discrimination
- Positive Example
- 2 augmentations (same image)
- Negative Example
- Augmented
- Feed positive and negative examples to classifier CNN
- Loss
- contrastive loss
- dot product (similarity) between augmentation 1 and positive and negative examples
Contrastive loss types
- end-to-end
- use all other examples as negatives in mini batch
- memory bank
- store negatives across iterations (Queue)
- from previous mini-batches
- don’t have to redo feature extraction
- no extra feature extraction needed (stored)
- store negatives across iterations (Queue)
- Momentum encoder
- exponential average of moving weights
- helps avoid stale weights issue in memory bank
Reinforcement Learning
Sequential decision making in an environment with evaluative feedback
Signature challenges in reinforcement learning
- evaluative feedback
- need trial/error to find the right action
- delayed feedback
- actions may not lead to immediate reward
- non-stationary
- data distribution of visisted states changes when the policy changes
- fleeting nature of time and online data
Markov decision process
(S, A, R, T, gamma)
- state
- action
- distribution of rewards R(s, a, s’)
- transition probability T(s, a, s’)
- gamma discount factor
Markov property
Current state completely characterizes state of the environment. Assume most recent observation is a sufficient statistic of history
What do we assume is unknown about an MDP in RL?
- Transition probability distribution
- Reward distribution
Value Iteration
Bellman Optimality Equation (value)
Q-Iteration is the same as value iteration except ___
it loops over actions as well as states
Parts of policy iteration
- Policy Evaluation
- Compute V_pi (similar to value iteration)
- Policy Refinement
- Greedily change actions as per V_pi at next steps
Why choose policy iteraton over value iteration?
Pi often converges to Pi* much sooner than V to V*
Deep Q-Learning
- Parameterized Q-function from data {(s, a, s’, r)} for N data points
- Linear function approximators
- Q(s, a; w, b) = waTs + ba
- Loss
- MSE
- (Qnew(s,a) - (r + gamma * maxaQold(s’, a)) )2
- Qnew - predicted Q-value
- Qold - target Q-value
- For stability
- Freeze Qold and update Qnew parameters
- Set Qold to Qnew at regular intervals
Deep Q-Learning - Correlated Data Problem
- Samples are correlated -> high variance gradients -> inefficient learning
- Current Q-Network parameters determines next training sample -> can lead to bad feedback loops
- Resolution?
- replay buffer that stores transitions
- update replay buffer as game (Experience) epuisodes are played, older samples discarded
- Train Q-network on random minibatches of transitions from the replay memory, instead of consecutive samples
- large the buffer, lower the correlation
What are the key steps of the Deep Q-Learning Algorithm
- Epsilon greedy action selection
- select random action with probability epsilon or max Q* action
- Experience replay
- store transition in replay buffer
- sample random minibatch of transitions from buffer
- Q update
- perform gradient descent
Derive the policy gradient