Ultra-scale Playbook - Data Parallelism

Notes on training LLMs using data parallelism strategy

May 17, 2025 · 5 min · 940 words