oneInfer-edge Brings Infrastructure Intelligence to Local AI Deployment
We just shipped the first feature for oneInfer-edge and it's open source.
Ever copy a Hugging Face model ID, spend 2 hours setting things up, and then watch it fail because your VRAM was off by a few GB? Yeah. We've all been there.
oneInfer-edge now tells you if your machine can run any Hugging Face model before you deploy.
Paste a model ID, it scans your GPU, VRAM, OS, and serving libraries, gives you a Hardware Ready verdict and full memory breakdown (weights + KV cache + serving overhead).
No surprises at runtime.
Supports Apple Silicon (M1 to M5), NVIDIA (CUDA), AMD (ROCm), and serving libraries including Ollama, llama.cpp, SGLang, TensorRT-LLM, PyTorch and many more coming.
It tells you why something won't work, not just that it won't.
CPU support is something we're actively working through and feedback and contributions on that front are very welcome.
oneInfer-edge is part of the broader oneinfer.ai inference control plane, a platform built for teams shipping multimodal AI products at scale.
oneInfer-edge brings that same infrastructure intelligence to your local machine so self-hosting is a genuine alternative to managed cloud inference, not a debugging exercise.
We built this in the open because self-hosted AI infrastructure should belong to the community that runs it.
Star the repo: https://github.com/oneinfer/oneinfer-edge
Report issues or request features: https://github.com/oneinfer/oneinfer-edge/issues
Learn more: https://oneinfer.ai/platform/oneinfer-edge
Drop us a star if this looks useful and PRs are wide open. We're just getting started.

Replies