Abhimanyu Talwar's website

Paper Summaries

These are my takeaways from Machine Learening papers that I read. Summarizing helps me understand things better. All credit is due to original authors.

29 Jul 2024: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

12 Jul 2024: FP8 Quantization: The Power of the Exponent

25 Jun 2024: Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

24 Jun 2024: Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis

20 Jun 2024: Nemotron-4 Technical Report