ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

References

  1. Rajbhandari, Rasley et. al. (2020), ZeRO: Memory Optimizations Toward Training Trillion Parameter Models