Notes on FSDP

References

  1. Meta AI, Fully Sharded Data Parallel: faster AI training with fewer GPUs
  2. NVIDIA, Collective Operations