underfit.ai
Home
Blog
About
Posts
A Thousand Ways to Write ZeRO-2: Setting a New Modded-NanoGPT Record
PyTorch Profiling 101 with Modded-NanoGPT
ZeRO One: Sharding the Optimizer
First Steps With Distributed Data Parallelism and PyTorch Profiling