Hi PH! 👋 I’m Jeremy, founder of Noumena Labs // Sipp :)

We built Sipp to be a simple, unified API for running AI models locally, through providers, or via self-hosted gateways. It’s one zero-dependency library that enables fast in-browser inference, reaching up to 3x the tok/s of popular alternatives in our benchmarks.

Our goal with Sipp is to make embedded and local AI more practical without sacrificing the utility of running larger models through provider APIs. Local AI opens up a lot of powerful use cases like continual monitoring, decision-making, chat and help bots, games, and so much more. Our mission is to help developers build things that start to feel possible when tokens are essentially “free.”

Here’s what makes Sipp different today:

🚀 Blazing-Fast WebGPU: Run models right in the browser with no installs and built in model caching support. In our benchmarks, Sipp reached 3x-5x faster time-to-first-token and decode speed compared to other browser runtimes.
🔀 One Unified API: Write your code once. Switch or split traffic seamlessly between local browser execution, cloud gateways, and remote providers without rewriting your application.
🌍 Start Local, Scale Anywhere: While our initial focus is the browser, the same client API is exposed through Node, Rust, Python, bare metal, or your own server infrastructure. We currently support CUDA, Vulkan, and Metal backends, and plan to add more as we flush the library out.
🧃 Zero Dependencies: 100% open-source, type-safe, and built in Rust / C++.

We have a live demo right on our site where you can pick a model and watch it run 100% on-device in your browser. We also provide a benchmarking tool, so you can run your own tests and directly compare results.

I’ll be hanging out here all day! Happy to go deep on the technical details, implementation, or code. And if you have specific use cases you’d like to explore, I’d love to hear about those too.

Hi PH! 👋 I’m Jeremy, founder of Noumena Labs // Sipp :)

Here’s what makes Sipp different today:

🚀 Blazing-Fast WebGPU: Run models right in the browser with no installs and built in model caching support. In our benchmarks, Sipp reached 3x-5x faster time-to-first-token and decode speed compared to other browser runtimes.
🔀 One Unified API: Write your code once. Switch or split traffic seamlessly between local browser execution, cloud gateways, and remote providers without rewriting your application.
🌍 Start Local, Scale Anywhere: While our initial focus is the browser, the same client API is exposed through Node, Rust, Python, bare metal, or your own server infrastructure. We currently support CUDA, Vulkan, and Metal backends, and plan to add more as we flush the library out.
🧃 Zero Dependencies: 100% open-source, type-safe, and built in Rust / C++.

I’ll be hanging out here all day! Happy to go deep on the technical details, implementation, or code. And if you have specific use cases you’d like to explore, I’d love to hear about those too.

Sipp

Run AI in the browser 3x faster. Zero dependencies.

Run AI in the browser 3x faster. Zero dependencies.