Ultra-scale Playbook - ZeRO Sharding
Notes on training LLMs using sharding strategies
Notes on training LLMs using sharding strategies
Notes on training LLMs using data parallelism strategy
Notes on Ultra-scale Playbook - training LLM on a single GPU