Last Quiz Flashcards
Why are continuous value functions harder?
You lose the ability to track the outcome of each action
Name the three big nuisance factors
Random seed
Hyperparameters
Network architecture
Explain why random seed is a nuisance factor
Initial policy weights and following exploration actions are randomized: some agents might “get lucky
Explain why hyperparameters are a nuisance factor
There’s no systematic way to find them yet (e.g. learning rate, reward scaling) but they have a big effect on algol success
Explain why network architecture is a nuisance factor
People running the same code on different setups get different results.
What are three ML “cheats”?
1) Report the max of many trials w/o mean and std dev
2) Selecting a random seed
3) Small sample size
What did Henderson do?
Create a “reproducibility checklist” that ML algols have to pass to be statistically significant
What’s the problem with his checklist?
People pretend to follow, but they don’t and it’s hard to check
Why does Q-learning over-estimate?
Scott’s story: continuously training on the same set creates overconfidence
How does ORB-SLAM work?
Recognize features and parallax to locate self in world
What is feature mapping?
creating feature descriptors, comparing feature vectors to assess motion
What is the pinhole camera model?
That all rays of light that fall onto a plane converge onto a single point.
What is Goodheart’s law?
When a measure becomes a target, it ceases to be a good measure
What is double Q-learning?
Actor-Critic where the critic criticizes the other actor