Pydata MCR talk on training LLMs
My talk on training LLMs at Pydata MCR
My talk on training LLMs at Pydata MCR
Observability platform for Python applications
Introduction to collective communication operations used for distributed training.
Introduction to distributed communication for GPUs.
How to authenticate EKS workloads with AWS services?