
nCompass
Debug your system's performance 10x faster
168 followers
Debug your system's performance 10x faster
168 followers
nCompass helps you debug and improve the performance of your system - 10x faster. Our AI agent can natively run profiles and analyze the resulting trace files to provide insights that are guided by runtime data. The goal is to save days of work that is normally spent battling a fragmented set of tools and large dumps of raw trace data.

nCompass
Hello Product Hunt! We're excited to launch the nCompass AI inference platform for reliable, scalable and fast inference of any HuggingFace model available! We're looking forward to have you build your AI apps on top of our system.
We have three connected products that we're launching today:
Below is an overview of each of these:
=====
Public Inference API with no enforced rate limits
Currently we have two state-of-the-art multimodal models set up; Gemma 3 27B and Llama 4 Maverick. The interface is fully OpenAI compatible and self-serve. This means all you have to do is change your API key, model name and base URL in your existing stack and you'll be able to capitalize on open source models that are potentially up to 18x cheaper and 2x faster than their closed source equivalents. View your usage and performance metrics live via our dashboard. Every sign-up gets some free credits to try out the system, so give it a go right now by signing up here (https://app.ncompass.tech).
=====
The next two products are currently not self-serve, but they're ready to go, you just have speak to us as there are some manual steps involved in the on-boarding process.
=====
Managed Inference Platform
Pick any HuggingFace model you want and we deploy it on dedicated hardware that we manage. We deal with picking the best inference engine and hardware for the deployment and we give you separated dev and prod clusters. Deploying the model to the dev cluster is one click and promoting to production once you're ready is just another. It really is that simple.
The best part about all of this is that we package each model you want to deploy with our custom optimized inference engine. If there's a model that we currently don't have optimizations for, we'll build those GPU kernels to ensure that you can run as many requests as possible on the minimum number of GPUs.
Why? Because we want AI to be cost-effective, we're not looking to sell GPUs, we want to bring you fast and scalable inference.
=====
White-labelled AI Inference Stack
This is basically all of the previous offering, but deployed on your infrastructure with your branding with extra admin console views so you can manage and monitor your users.
=====
So why did we start this? Well all of us in the nCompass team are experts in hardware acceleration and wanted to apply our expertise to improve AI inference performance. We believe that AI model use is going to be ubiquitous in the future and there's still plenty of performance to eke out of GPUs in order to make using AI models at scale and in production both reliable and cost-effective. We don't believe what you need is 72 GPU clusters, so we're ensuring you can meet your AI inference requirements on existing infrastructure.
We'd love for you to sign up and try out our API. Alternatively if you're looking for a dedicated, no queue production AI inference deployment, please do reach out by booking a call or just emailing us at hello@ncompass.tech.
If you’ve tried it out, we'd love to hear your feedback on what did or didn't work. We're constantly trying to improve our offering :)
@aditya_rajagopal awesome work!!
nCompass
@chris_parsonson1 Thank you!
nCompass
Not Diamond
Huge nCompass fan—congrats guys!
Congrats on launching nCompass AI! The combination of custom GPU kernels, reliable uptime, and built-in performance monitoring is a solid foundation for any AI deployment. This will definitely make running AI models at scale smoother and more efficient.
nCompass
Are you looking to go beyond Hugging face too?
nCompass
@manu_goel2 Absolutely! Our stack works for HuggingFace models but this isn't a requirement.
If you have a model you'd like optimised/hosted, please don't hesitate to reach out.
Congratulations on the launch of nCompass Tech! Your platform addresses the critical need for reliable and efficient deployment of AI models in production environments.
What specific monitoring tools does nCompass Tech provide to ensure optimal performance and health of deployed HuggingFace models?
nCompass
@jepri_kasuma thank you! We use a combination of the following:
Health
Sentry for real-time error monitoring and alerts
Kubernetes Dashboard for ad-hoc service checks, live logs etc.
Grafana for system health
Custom written services to check actual uptime and trigger recovery processes if required
Performance
This is done through in-house services which monitor queue length and request level metrics(eg. TTFT), and trigger auto-scaling accordingly
nCompass is one of the best inference providers out there! This team is at the cutting edge of the latest innovations in driving inference gains and ships crazy crazy fast.
Congratulations on the launch!!