1 post found
A deep dive comparing standard softmax attention, linear attention, and Flash Attention: their math, …