It is coming.
The most requested feature since we open sourced oneinfer-edge.
We are adding a "Quantization Playbook" as a standalone feature inside the oneinfer-edge open source repo.
Right now, quantizing a model for local deployment means researching the right format for your hardware, running separate tools, debugging format incompatibilities with your serving library, and then starting the deployment process from scratch. A multi-hour process that sits completely outside your inference control plane.
With the Quantization Playbook inside oneinfer-edge - your hardware is already scanned. Your serving library is already detected. Your model is already known. Quantization becomes the natural step before deployment - not a separate workflow you manage outside the app.
Pick your model. oneinfer-edge handles the quantization. Deploy directly from the same control plane.
And before you deploy - you see exactly what you are trading. Perplexity. Token accuracy. Quality delta across quantization levels. So the decision is not a guess. It is a tradeoff you make with full visibility.
This is part of our commitment to making oneinfer-edge the only open source tool a developer needs to go from raw model to running inference - local, cloud, or both.
More details dropping soon.
Repo in the comments below.
Star the repo. You will not want to miss this one.

Replies
@rapata_pavankumar - Repo: https://github.com/oneinfer/oneinfer-edge