Abhimanyu's Notes

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

References

Dao et. al. (2022), FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Intel's note on Tiling, Loop Optimizations Where Blocks are Required