Tag

#attention

2 posts found

Attention Residuals: How Kimi Rethinks Depth-Wise Information Flow in LLMs

Attention Residuals: How Kimi Rethinks Depth-Wise Information Flow in LLMs

Kimi's Attention Residuals paper proposes replacing fixed residual connections with learned softmax …

March 20, 2026 7min

Attention Mechanisms Compared: Standard, Linear, and Flash

Attention Mechanisms Compared: Standard, Linear, and Flash

A deep dive comparing standard softmax attention, linear attention, and Flash Attention: their math, …

March 11, 2026 5min