FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

References

  1. Dao et. al. (2022), FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
  2. Intel's note on Tiling, Loop Optimizations Where Blocks are Required