FlashMLA

Faster LLM Inference on Hopper GPUs

10 followers

Faster LLM Inference on Hopper GPUs

10 followers

FlashMLA, from DeepSeek, is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences. Achieves up to 3000 GB/s memory bandwidth and 580 TFLOPS.

Overview
Reviews
Team
More

FlashMLA launches

Launch date

FlashMLA Faster LLM Inference on Hopper GPUs

Launched on February 24th, 2025