Lesson 4 Flashcards

Question 1

Q

How does NLP Transfer learning work?

Answer

A

1) fit language model; it predicts the next word of a sentence. This is hard! You need to know a lot about English and a lot about the world! e.g., fit on WikiText 103 dataset; most of the largest articles on Wikipedia. About 1bn tokens. This is pre-trained model. 2) transfer learning… fine tuning to predict the next word of you domainm aka target corpus (e.g., movie reviews). Don’t need any labels at all! “Self-supervised model” 3) fine tune for classifier with labels on smaller set

Question 2

Q

Trick in creating language model?

Answer

A

Use train and test… all text… to train the language model!

Question 3

Q

What is process to fit and fine tune language model?

Answer

A

language_model_learner … It creates an RNN drop_mult=0.3 is dropout(?) lr_find fit_one_cycle unfreeze learn.fit_one_cycle(10,…) 0.30 accuracy is great (so ~1/3 of the time you can predict the exact next word!) ** This training could take over a day ** use learn.predict… to check it is sensible. You are generating sentences 26:30

Question 4

Q

How do you go from language model to classifier?

Answer

A

save the encoder (don’t need the decoder which is the generator) need to ensure you use the SAME VOCABULARY as the language model learn = text_classifier_learner(clf, drop_mult=0.5) learn.load_encoder(‘fine_tuned_enc’) learn.freeze() lr_find learn.fit_one_cycle(…)

Question 5

Q

What does learn.freeze_to(-2) mean?

Answer

A

Just unfreeze the last two layers

Question 6

Q

What is process to fine tune classifier?

Answer

A

fit_one_cycle() learn.freeze_to(-2) fit_one_cycle() learn.freeze_to(-3) fit_one_cycle() It helps with text class to unfreeze one layer at a time lastly, unfreeze the entire thing 31:29

Question 7

Q

Discrimiigtive learning rate

Answer

A

How much do I decrease the learning rate as I move from layer to layer

Question 8

Q

What is 2.6

Answer

A

35:4- Stephen merity, Frank Hudder; how can use Random Forrest to find optimal hyperparameters. Like autoML.

Question 9

Q

What do you use embeddings for?

Answer

A

Categorical data is converted to embeddings Continuous data is fed in as is

Question 10

Q

How do we deal missing missing data?

Answer

A

Replace with median, add binary column with is_missing

Question 11

Q

How can you make a validation set with contiguous periods in fastai 1.x?

Answer

A

TabularList.from_df(…).split_by_idx()

Question 12

Q

How do you make the tabular learner

Answer

A

get_tabular_learner(data, layers=[200,100], metrics=…)

Question 13

Q

What is collab filtering?

Answer

A

Recommender system… user and who like what; bunch of users; most simple dataset: userid, movieid, numberofstars Think of it as big sparse matrix with movies on one axis, user on another, rating as value.

Question 14

Q

cold start problem?

Answer

A

have a second model, meta-data driven model, for new users or new movies; or like Netflix UX, when you sign up they ask you a bunch of questions

Question 15

Q

For tabular time series

Answer

A

Jeremy says not to use RNN when there are other features you can use (store open? promotion? weather? day of week, etc).

Question 16

Q

How does collab filtering work?

Answer

A

Lesson 4;1:09; its a matrix completion problem; M is userid x movieid matrix. M = AB, where I is 5xnum movies matrix and B is num users x 5 matrix. A and B are initialized randomly.

It’s not really matrix mult: it’s embedding mult of vectors. Dot product of each vector -> scaler in the M matrix

loss function is diff between given matrix and M, squared; add up

use gradient descent to make loss smaller

This is a single linear layer :-)

Question 17

Q

What is an embedding?

Answer

A

A matrix of weights

A matrix of weighs which you can lookup into and grab one vector out of

Designed as something you can index into as an array and grab one vector out of

Collab filtering has two embedding matrices: user and movie

then need to add bias per user and per movie

Question 18

Q

How do you force a contiuous value into a range?

Answer

A

sigmoid(res)*(max-min)+min

Question 19

Q

Inputs

Weights/parameters

Activations

Output

Loss

Metric

Cross-entropy

Softmax

Fine-tuning

Answer

A

In pytorch weigths are called parameters (could be weights or biases)

input @ weights = activations

actviation function(activations) also called activations

activation is the result of a matrix mult or activation

last layer likely to be a sigmoid b/c you want something between two values