The biggest update in 5 years. v5 brings a modular design, first-class quantization, and a new OpenAI-compatible serving API. Optimized for PyTorch and fully interoperable with the modern AI stack (vLLM, llama.cpp, GGUF).
It’s hard to believe, but Transformers v4 was released back in November 2020. Think about that: v4 predates ChatGPT, Stable Diffusion, and the entire generative AI boom. Today, with 3M+ daily installs and 1.2B+ total downloads, it has become the undeniable "operating system" of modern AI.
v5 is a maturity milestone. While v4 was about exploding growth (from 40 to 400+ architectures), v5 is about standardization and interoperability.
Big shifts in this release:
Interoperability is Key: v5 is built to play nice with the entire ecosystem—seamlessly connecting with vLLM, SGLang, and llama.cpp. You can even load GGUF files directly now.
Production Ready: They introduced transformers serve, an OpenAI-compatible server for easy deployment and testing.
Quantization First: No longer an afterthought. Low-precision formats (4-bit/8-bit) are now first-class citizens with cleaner APIs.
PyTorch Focus: They are going all in on PyTorch as the primary backend to ensure peak performance, while maintaining compatibility with JAX/Flax.
For the community, Transformers remains the "Source of Truth" for model definitions. If a paper comes out, the code usually lands here first.
Huge congrats to the @Hugging Face team and the all the contributors who made this happen. The past 5 years have been unforgettable, and the next 5 look even more exciting!🔥
Report
Cool! Congratulations on the new launch. We’re also building an AI startup right now, but unfortunately, it’s not open-source yet :)
Replies
Flowtica Scribe
Hi everyone!
It’s hard to believe, but Transformers v4 was released back in November 2020. Think about that: v4 predates ChatGPT, Stable Diffusion, and the entire generative AI boom. Today, with 3M+ daily installs and 1.2B+ total downloads, it has become the undeniable "operating system" of modern AI.
v5 is a maturity milestone. While v4 was about exploding growth (from 40 to 400+ architectures), v5 is about standardization and interoperability.
Big shifts in this release:
Interoperability is Key: v5 is built to play nice with the entire ecosystem—seamlessly connecting with vLLM, SGLang, and llama.cpp. You can even load GGUF files directly now.
Production Ready: They introduced transformers serve, an OpenAI-compatible server for easy deployment and testing.
Quantization First: No longer an afterthought. Low-precision formats (4-bit/8-bit) are now first-class citizens with cleaner APIs.
PyTorch Focus: They are going all in on PyTorch as the primary backend to ensure peak performance, while maintaining compatibility with JAX/Flax.
For the community, Transformers remains the "Source of Truth" for model definitions. If a paper comes out, the code usually lands here first.
Huge congrats to the @Hugging Face team and the all the contributors who made this happen. The past 5 years have been unforgettable, and the next 5 look even more exciting!🔥
Cool! Congratulations on the new launch. We’re also building an AI startup right now, but unfortunately, it’s not open-source yet :)