[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton
GitHub Daily Trend - Un pódcast de VoiceFeed

https://github.com/triton-lang/triton/pull/7298 Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the so...