Sarathi Flashcards
1
Q
What percent of compute time during LLM inference is spent on attention?
A
5-10%
2
Q
Ffn_ln1 stands for
A
Feed forward network layer normalization 1
-AI > Sarathi > Flashcards
What percent of compute time during LLM inference is spent on attention?
5-10%
Ffn_ln1 stands for
Feed forward network layer normalization 1