ZeRO Sharding Revisited - Implementations using PyTorch

Implement ZeRO sharding strategies using PyTorch

February 28, 2026 · 23 min · 4762 words

Data Parallelism Revisited - Implementations using PyTorch

Implement various data parallelism strategies using PyTorch

December 28, 2025 · 21 min · 4273 words

Ultrascale Playbook - Expert Parallelism

Notes on training LLMs using expert parallelism

December 13, 2025 · 8 min · 1506 words

Ultrascale Playbook - Context Parallelism

Notes on training LLMs using context parallelism

November 22, 2025 · 11 min · 2284 words

Ultrascale Playbook - Tensor and Sequence Parallelism

Notes on training LLMs using tensor and sequence parallelism

November 11, 2025 · 12 min · 2428 words