Posts

Peeling Back the Stack: A Toy Model of Stack Frames

Learning programming languages is fun. My usual path is a couple of years writing the code and building projects in the particular language to get comfortable. It includes learning the best practices and understanding the different ways something could have been implemented. Then, occasionally, I get curious about what is happening under the hood sort of like peeling a layer of onion. Eventually, peeling back enough layers, it reaches the compiler and finally the assembly or machine code. This code is then executed by the CPU. ...

Where vLLM Cold-Start Time Goes on GKE?

Measuring vLLM cold-start bottlenecks on GKE and evaluating ways to reduce time to first request.

Docker and Pals

Architecture components of Docker

Pipeline Parallelism Revisited - Implementations using PyTorch

Implementing and profiling pipeline parallelism techniques using PyTorch

TIL - Cloud VMs, microVM, Unikernels and hypervisors

Anatomy of different layers of abstractions for a cloud compute