Where vLLM Cold-Start Time Goes on GKE?

Measuring vLLM cold-start bottlenecks on GKE and evaluating ways to reduce time to first request.

April 11, 2026 · 12 min · 2519 words

Pipeline Parallelism Revisited - Implementations using PyTorch

Implementing and profiling pipeline parallelism techniques using PyTorch

March 13, 2026 · 26 min · 5519 words

ZeRO Sharding Revisited - Implementations using PyTorch

Implement ZeRO sharding strategies using PyTorch

February 28, 2026 · 23 min · 4762 words

Data Parallelism Revisited - Implementations using PyTorch

Implement various data parallelism strategies using PyTorch

December 28, 2025 · 21 min · 4273 words

Ultrascale Playbook - Expert Parallelism

Notes on training LLMs using expert parallelism

December 13, 2025 · 8 min · 1506 words