DeepGEMM
Unlock Maximum FP8 Performance on Hopper GPUs
7 followers
Unlock Maximum FP8 Performance on Hopper GPUs
7 followers
DeepGEMM, from DeepSeek, is an open-source library for highly optimized FP8 GEMM kernels on Hopper GPUs. Clean codebase (~300 LOC), JIT-compiled, no heavy dependencies.




Flowtica Scribe
Hi everyone!
Sharing DeepGEMM, a new open-source library from DeepSeek. This is definitely for the hardcore GPU optimization crowd, but it's a great example of how much performance can be squeezed out of specialized hardware.
DeepGEMM provides highly optimized FP8 GEMM (General Matrix Multiplication) kernels specifically for NVIDIA Hopper GPUs. It's designed to power both dense models and Mixture-of-Experts (MoE) models.
This is about getting the absolute maximum performance out of Hopper GPUs for a core operation in deep learning. It's inspired by CUTLASS and CuTe, but aims for a much cleaner, simpler implementation. This is the tech powering DeepSeek-V3 and R1.
And it's the third product of DeepSeek's OpenSourceWeek.