Hessian Approximation Flashcards
1
Q
So if I had a large language model with 1 trillion parameters, how many entries would the Hessian have?
A
1 trillion by 1 trillion = 10^24 = one septillion
2
Q
How much memory would you need to store 1 septillion 2byte params?
A
2 yottabytes