writing
Flash Attention
Feb 2026
The Queue
A quick proof of concept: click the button to add balls to a queue, and watch them get consumed by the server.
The Memory Hierarchy Bottleneck
The key insight behind Flash Attention: most of the time is spent moving data between HBM (slow, large) and SRAM (fast, small), not on the actual computation. Adjust the sequence length and hit “Run Naive” to see the bottleneck in action.