Ultra-scale Playbook - Data Parallelism
Notes on training LLMs using data parallelism strategy
Notes on training LLMs using data parallelism strategy
Notes on Ultra-scale Playbook - training LLM on a single GPU
Interesting things I learnt this week
Introduction and motivation