Lesson 4 Flashcards
How does NLP Transfer learning work?
1) fit language model; it predicts the next word of a sentence. This is hard! You need to know a lot about English and a lot about the world! e.g., fit on WikiText 103 dataset; most of the largest articles on Wikipedia. About 1bn tokens. This is pre-trained model. 2) transfer learning… fine tuning to predict the next word of you domainm aka target corpus (e.g., movie reviews). Don’t need any labels at all! “Self-supervised model” 3) fine tune for classifier with labels on smaller set
Trick in creating language model?
Use train and test… all text… to train the language model!
What is process to fit and fine tune language model?
language_model_learner … It creates an RNN drop_mult=0.3 is dropout(?) lr_find fit_one_cycle unfreeze learn.fit_one_cycle(10,…) 0.30 accuracy is great (so ~1/3 of the time you can predict the exact next word!) ** This training could take over a day ** use learn.predict… to check it is sensible. You are generating sentences 26:30
How do you go from language model to classifier?
save the encoder (don’t need the decoder which is the generator) need to ensure you use the SAME VOCABULARY as the language model learn = text_classifier_learner(clf, drop_mult=0.5) learn.load_encoder(‘fine_tuned_enc’) learn.freeze() lr_find learn.fit_one_cycle(…)
What does learn.freeze_to(-2) mean?
Just unfreeze the last two layers
What is process to fine tune classifier?
fit_one_cycle() learn.freeze_to(-2) fit_one_cycle() learn.freeze_to(-3) fit_one_cycle() It helps with text class to unfreeze one layer at a time lastly, unfreeze the entire thing 31:29
Discrimiigtive learning rate
How much do I decrease the learning rate as I move from layer to layer
What is 2.6
35:4- Stephen merity, Frank Hudder; how can use Random Forrest to find optimal hyperparameters. Like autoML.
What do you use embeddings for?
Categorical data is converted to embeddings Continuous data is fed in as is
How do we deal missing missing data?
Replace with median, add binary column with is_missing
How can you make a validation set with contiguous periods in fastai 1.x?
TabularList.from_df(…).split_by_idx()
How do you make the tabular learner
get_tabular_learner(data, layers=[200,100], metrics=…)
What is collab filtering?
Recommender system… user and who like what; bunch of users; most simple dataset: userid, movieid, numberofstars Think of it as big sparse matrix with movies on one axis, user on another, rating as value.
cold start problem?
have a second model, meta-data driven model, for new users or new movies; or like Netflix UX, when you sign up they ask you a bunch of questions
For tabular time series
Jeremy says not to use RNN when there are other features you can use (store open? promotion? weather? day of week, etc).
How does collab filtering work?
Lesson 4;1:09; its a matrix completion problem; M is userid x movieid matrix. M = AB, where I is 5xnum movies matrix and B is num users x 5 matrix. A and B are initialized randomly.
It’s not really matrix mult: it’s embedding mult of vectors. Dot product of each vector -> scaler in the M matrix
loss function is diff between given matrix and M, squared; add up
use gradient descent to make loss smaller
This is a single linear layer :-)
What is an embedding?
A matrix of weights
A matrix of weighs which you can lookup into and grab one vector out of
Designed as something you can index into as an array and grab one vector out of
Collab filtering has two embedding matrices: user and movie
then need to add bias per user and per movie
How do you force a contiuous value into a range?
sigmoid(res)*(max-min)+min
Inputs
Weights/parameters
Activations
Output
Loss
Metric
Cross-entropy
Softmax
Fine-tuning
In pytorch weigths are called parameters (could be weights or biases)
input @ weights = activations
actviation function(activations) also called activations
activation is the result of a matrix mult or activation
last layer likely to be a sigmoid b/c you want something between two values