Quiz 5 - Module 4 Flashcards

Question

Fixed match (pseudo-labeling)

Answer 1

* Unlabeled data example * Weakly augment * Make prediction, generate pseudo-label * throw out cases below threshold * Strongly augment * Make prediction, use pseudo-label as ground truth

Answer 2

* Labeled examples (feed directly to model) * Unlabeled examples * Combined * Weakly augment * Strongly augment * Losses * Cross Entropy (Labeled Data) * Cross Entropy (strongly augmented unlabeled data using weakly augmented pseudo-label as ground truth)

Answer 3

Learn feature extractors and apply to unlabeled data. Get unlabeled data labeled similarly to labeled data in clusters (like KNN)

Answer 4

* Base set of data * lots of labels * New set * very few labels (1 - 5 examples per category) in new categories * transfer learning

Answer 5

* Fine-tuning * train classifier on base classes * freeze feature extractor * learn classifier for new classes (during "query" time) * Simulate (N-Way K-Shot Tasks) * Meta-training * Better at making train reflect what will happen during test

Answer 6

* cosine (similarity-based) * instead of linear layer * unit comparison (A.B) / (norm (A) norm(B) ) * normalized (unit-norm) comparison may discriminate a small number of classes better since it focuses on an angular difference

Answer 7

* useful for few-shot learning * makes training better reflect test (simulate smaller tasks) * N-Way K-Shot Tasks * N - number of categories * K - examples per category * Can pre-train features on held-out base classes

Answer 8

* Meta-Learner LSTM * want to learn gradient descent * update rules * param initialization * adaptive LR, weight decay to reduce overfit * gradient descent update looks like LSTM update * Model-agnostic meta learning (MAML) * want to learn parameter initialization * normal gradient descent

Answer 9

* no labels at all

Answer 10

* Low dimensional embedding between an encoder and a decoder * Loss * Minimize difference (MSE)

Answer 11

* reconstruction * rotate image * colorization * relative image patch location (jigsaw) * video: next frame prediction * instance prediction

Answer 12

* self-supervised task * input * grayscale * output * color * loss * MSE

Answer 13

* self-supervising task * input: image patches * output: prediction of discrete image pach location relative to center * loss: cross-entropy classification (which position)

Answer 14

* input: image with various rotations * output: prediction rotation amount * objective: cross-entropy classification

Answer 15

* train the model with surrogate task * extrack the convnet (encoder part) * transfer to actual task * use to initiaize model of another supervised learning task * use it to extract featrures for learning a separate classifier (NN, SVM) * often classifier is limited to linear layer and features are frozen

Answer 16

* Positive Example * 2 augmentations (same image) * Negative Example * Augmented * Feed positive and negative examples to classifier CNN * Loss * contrastive loss * dot product (similarity) between augmentation 1 and positive and negative examples

Answer 17

* end-to-end * use all other examples as negatives in mini batch * memory bank * store negatives across iterations (Queue) * from previous mini-batches * don't have to redo feature extraction * no extra feature extraction needed (stored) * Momentum encoder * exponential average of moving weights * helps avoid stale weights issue in memory bank

Answer 18

Sequential decision making in an environment with evaluative feedback

Answer 19

* evaluative feedback * need trial/error to find the right action * delayed feedback * actions may not lead to immediate reward * non-stationary * data distribution of visisted states changes when the policy changes * fleeting nature of time and online data

Answer 20

(S, A, R, T, gamma) * state * action * distribution of rewards R(s, a, s') * transition probability T(s, a, s') * gamma discount factor

Answer 21

Current state completely characterizes state of the environment. Assume most recent observation is a sufficient statistic of history

Answer 22

* Transition probability distribution * Reward distribution

Answer 23

it loops over actions as well as states

Answer 24

* Policy Evaluation * Compute V\_pi (similar to value iteration) * Policy Refinement * Greedily change actions as per V\_pi at next steps

Answer 25

Pi often converges to Pi\* much sooner than V to V\*

Answer 26

* Parameterized Q-function from data {(s, a, s', r)} for N data points * Linear function approximators * Q(s, a; w, b) = w_a^Ts + b_a * Loss * MSE * (Q_new(s,a) - (r + gamma \* max_aQ_old(s', a)) )² * Q_new- predicted Q-value * Q_old - target Q-value * For stability * Freeze Q_oldand update Q_new parameters * Set Q_old to Q_new at regular intervals

Answer 27

* Samples are correlated -\> high variance gradients -\> inefficient learning * Current Q-Network parameters determines next training sample -\> can lead to bad feedback loops * Resolution? * replay buffer that stores transitions * update replay buffer as game (Experience) epuisodes are played, older samples discarded * Train Q-network on random minibatches of transitions from the replay memory, instead of consecutive samples * large the buffer, lower the correlation

Answer 28

* Epsilon greedy action selection * select random action with probability epsilon or max Q\* action * Experience replay * store transition in replay buffer * sample random minibatch of transitions from buffer * Q update * perform gradient descent