Tag: hbm

  • Kernel Case Study: Flash Attention

    Kernel Case Study: Flash Attention The attention mechanism is at the core of modern day transformers. But scaling the context window of these transformers was a major challenge, and it still is even though we are in the era of a million tokens + context window (Qwen 2.5 [1]). There are both considerable compute and memory…