Back
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
References
Dao et. al. (2022),
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Intel's note on Tiling,
Loop Optimizations Where Blocks are Required