Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

References

  1. Narayanan et. al. (2021), Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM