← Back to Vault

Sparse MoE for Efficiency

Cameron Rohn · Category: frameworks_and_exercises

Moonshot’s trillion-parameter model uses a mixture-of-experts sparse attention design that activates only 32 billion parameters at once, demonstrating how sparse MoE can deliver large model capacity with reduced compute.