FlashMLA

FlashMLA

Faster LLM Inference on Hopper GPUs

10 followers

FlashMLA, from DeepSeek, is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences. Achieves up to 3000 GB/s memory bandwidth and 580 TFLOPS.

FlashMLA launches

Launch date
FlashMLA
FlashMLA Faster LLM Inference on Hopper GPUs

Launched on February 24th, 2025