writing

Flash Attention

Apr 2026

The Queue

A quick proof of concept: click the button to add balls to a queue, and watch them get consumed by the server.

The Memory Hierarchy Bottleneck

The key insight behind Flash Attention: most of the time is spent moving data between HBM (slow, large) and SRAM (fast, small), not on the actual computation. Adjust the sequence length and hit “Run Naive” to see the bottleneck in action.