Troubleshooting Flashcards
Why is your model worst then the authers?
Implementation bug
Hyper parameters choices - model could be extremely sensitive
Data model fit - different data from the paper
Dataset construction - most time in industry is spent on datasets and not models
בגדול מה צריכה להיות האסטרטגיה?
פסימיסם,
בגלל שקשה לעשות דיבאג - אז להתחיל ממש בדברים הפשוטים ואז לעלות את המורכבות
בגדול מה התהליך של בניית מודל
להתחיל בפשוט - לבחור מודל קל ודאטה קל
ליצור ולדבג את המודל
להעריך את התוצאות
לשפר את ההיפרפרמטר
לשפר את הדאטה/המודל
When starting simple - what architecture to choose?
For images start with LeNet like architecture then move to resnet
For sequences start with Transformer/attention model than move to wavenet like model
For other start with simple fully connected NN
For multiple input - say a picture with a phrase - start with making each input into a lower dimensional feature space for example use convNet and flatten the results, same with sequence use LSTM and keep the final vector. Then concaténate all together and pass the output through a fully connected layer
Optimizer defualts
Adam with learning rate of 3e-4
Relu for cnn tanh for lstm
Regularización - none
Normalization - none (like batch normalization - not the one on the input)
Both are none because they are a source of bugs
Should i normalize the input data?
Yes!
Make aure to do it!
And that its not done automaticaly
How to simplify the problem so we can start easy
Small training set (less then 10,000)
Less classes/objects/smaller pictuers etc..
Create a simple synthetic training set
3 General advice for implementing the model
- Lightweight implementation
פחות מ200 שורות קוד… - Of the shelf components
- Build complicated pipelines later