DeepGEMM

Unlock Maximum FP8 Performance on Hopper GPUs

7 followers

Unlock Maximum FP8 Performance on Hopper GPUs

7 followers

DeepGEMM, from DeepSeek, is an open-source library for highly optimized FP8 GEMM kernels on Hopper GPUs. Clean codebase (~300 LOC), JIT-compiled, no heavy dependencies.

Free

Launch tags:Open Source•Artificial Intelligence•GitHub

Launch Team

Framer AI AgentsDesign and publish professional sites with AI

Promoted

Flowtica Scribe

Hunter

📌

Hi everyone!

Sharing DeepGEMM, a new open-source library from DeepSeek. This is definitely for the hardcore GPU optimization crowd, but it's a great example of how much performance can be squeezed out of specialized hardware.

DeepGEMM provides highly optimized FP8 GEMM (General Matrix Multiplication) kernels specifically for NVIDIA Hopper GPUs. It's designed to power both dense models and Mixture-of-Experts (MoE) models.

This is about getting the absolute maximum performance out of Hopper GPUs for a core operation in deep learning. It's inspired by CUTLASS and CuTe, but aims for a much cleaner, simpler implementation. This is the tech powering DeepSeek-V3 and R1.

And it's the third product of DeepSeek's OpenSourceWeek.

Report

1yr ago

Pros

Cons

Reviews