Paper Summaries
These are my takeaways from Machine Learening papers that I read. Summarizing helps me understand things better. All credit is due to original authors.
29 Jul 2024: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
12 Jul 2024: FP8 Quantization: The Power of the Exponent
25 Jun 2024: Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
24 Jun 2024: Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis
20 Jun 2024: Nemotron-4 Technical Report