Zac Zuo

Transformers v5 - The backbone of modern AI, re-engineered

The biggest update in 5 years. v5 brings a modular design, first-class quantization, and a new OpenAI-compatible serving API. Optimized for PyTorch and fully interoperable with the modern AI stack (vLLM, llama.cpp, GGUF).

Add a comment

Replies

Best
Zac Zuo

Hi everyone!

It’s hard to believe, but Transformers v4 was released back in November 2020. Think about that: v4 predates ChatGPT, Stable Diffusion, and the entire generative AI boom. Today, with 3M+ daily installs and 1.2B+ total downloads, it has become the undeniable "operating system" of modern AI.

v5 is a maturity milestone. While v4 was about exploding growth (from 40 to 400+ architectures), v5 is about standardization and interoperability.

Big shifts in this release:

  • Interoperability is Key: v5 is built to play nice with the entire ecosystem—seamlessly connecting with vLLM, SGLang, and llama.cpp. You can even load GGUF files directly now.

  • Production Ready: They introduced transformers serve, an OpenAI-compatible server for easy deployment and testing.

  • Quantization First: No longer an afterthought. Low-precision formats (4-bit/8-bit) are now first-class citizens with cleaner APIs.

  • PyTorch Focus: They are going all in on PyTorch as the primary backend to ensure peak performance, while maintaining compatibility with JAX/Flax.

For the community, Transformers remains the "Source of Truth" for model definitions. If a paper comes out, the code usually lands here first.

Huge congrats to the @Hugging Face team and the all the contributors who made this happen. The past 5 years have been unforgettable, and the next 5 look even more exciting!🔥

Mykyta Semenov 🇺🇦🇳🇱

Cool! Congratulations on the new launch. We’re also building an AI startup right now, but unfortunately, it’s not open-source yet :)