Choosing a batch size and provider for LLM training
Notes on choosing appropriate batch size and compute for training LLMs
Notes on choosing appropriate batch size and compute for training LLMs
Notes on training LLMs using sharding strategies
Notes on training LLMs using data parallelism strategy
Notes on Ultra-scale Playbook - training LLM on a single GPU