Ultra-scale Playbook - Deepspeed ZeRO

Notes on training LLMs using sharding strategies

June 21, 2025 · 8 min · 1519 words