Sarathi Flashcards by john simerlink

What percent of compute time during LLM inference is spent on attention?

5-10%

How well did you know this?

Not at all

Perfectly

Ffn_ln1 stands for

Feed forward network layer normalization 1

How well did you know this?

Not at all

Perfectly