Understanding Activation Memory in Mixture of Experts Models – Frank Denneman
Understanding Activation Memory in Mixture of…
Explains how activation memory behaves in Mixture of Experts models and why long-context and agentic inference introduce unpredictable activation peaks during prefill phases.