ZeRO Sharding Revisited - Implementations using PyTorch
Implement ZeRO sharding strategies using PyTorch
Implement ZeRO sharding strategies using PyTorch
Patterns in python code design
Implement various data parallelism strategies using PyTorch
Notes on training LLMs using expert parallelism
Notes on training LLMs using context parallelism