FlashMLA
Faster LLM Inference on Hopper GPUs
10 followers
Faster LLM Inference on Hopper GPUs
10 followers
FlashMLA, from DeepSeek, is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences. Achieves up to 3000 GB/s memory bandwidth and 580 TFLOPS.
No makers yet
It looks like there are no makers for this product.
