Justin Jincaid

Inference Engine by GMI Cloud - Fast multimodal-native inference at scale

GMI Inference Engine is a multimodal-native inference platform that runs text, image, video and audio in one unified pipeline. Get enterprise-grade scaling, observability, model versioning, and 5–6× faster inference so your multimodal apps run in real time.

Add a comment

Replies

Best
Jim Engine

Okay a lot of informations on the website. But where is it hosted? Is it wrapped around Goolge Cloud, AWS or do your have your own datacenters with the GPUs. I don't really get it.

Justin Jincaid

@jim_engine I am not affiliated with GMI Cloud, but looks like they operate their own data centers.

News reference: https://www.reuters.com/world/asia-pacific/gmi-cloud-build-500-million-ai-data-centre-in-taiwan-with-nvidia-chips-2025-11-17/

Maybe @louisa_guo can add some more information shortly.

Nicole Gong

@jim_engine GMI runs our own GPU infrastructure, not AWS or Google Cloud.
We operate dedicated GPU clusters in multiple data centers, which gives us much better performance control, predictable pricing, and the ability to offer features like scale-to-zero and smart GPU pooling.

On top of that hardware layer, we built our own inference engine that handles scheduling, autoscaling, acceleration, and deployment.

So in short: we manage the GPUs ourselves and provide the inference layer on top — not a wrapper around another cloud. Happy to share more details if you’re curious!

Jim Engine

@nicole_gong2 Thanks for providing us with that information, I think you should put that a bit more visible on your landing page, it's a crucial information for most here - just some feedback

CKMo

@nicole_gong2  @jim_engine 

thanks for the feedback Jim! We'll get on that posthaste!

Andrii Kpyto
👍
Yaroslav Chuykov
Congrats on the launch!
Saul Fleischman

Can you give us a video with less flash anmd sizzle - but actually shows how we will mak decisions om AI models, how your technology will aid this, and so on? The closest you come to this is at the end of the video, but still far from showing a solution for anything. Thanks, eager to understand, so I can use this.

Louisa Guo
@osakasaul Hi Saul I am happy to do a quick demo for you to explore the fullness of our inference engine! May I know your email to reach out?
Lavana Cricko

Just to confirm, all applications run on your inference platform will be on the dedicated node, right?

Then what about pricing? Will it be much more expensive than shared platforms?

Nicole Gong

@lavana_cricko Yes, all workloads on GMI run on dedicated GPU nodes (not shared/oversubscribed machines). This gives you stable performance, predictable latency, and none of the “noisy neighbor” issues you typically see on shared platforms.

And surprisingly, it’s not more expensive.
Because we operate our own GPU clusters and use aggressive optimizations like scale-to-zero, smart pooling, and low-overhead scheduling, we can keep costs at or below typical shared-inference providers — while giving you dedicated performance.

You essentially get:

  • dedicated GPUs when you need them

  • zero cost when idle

  • optimized throughput when scaling up

So you get higher reliability without the premium pricing usually associated with dedicated hardware.

Happy to break down pricing examples if that helps!

Diyako

Congrats on the launch, really strong work overall.

One quick thought that could make the page even better:

Right now the hero phrase “Build AI Without Limits” + list of offerings communicates ambition, but it’s a bit broad. Consider tightening the headline or sub-headline to clearly show the core benefit for your main user segment (for example: “Get enterprise-grade GPU access and deploy your models in minutes, no DevOps needed”)

CKMo

@dksnpz Appreciate the feedback - we'll see what we can integrate!

Alex Cloudstar

Been fighting GPU quotas lately—if the console hides the usual pain (SSH, firewall spaghetti), that’s a win. Curious what GPUs you’ve got on tap and how burst pricing works. Bare metal + containers in one place sounds handy, esp. for multi-region stuff.

CKMo

@alexcloudstar H100s and H200s initially for on-demand containers!

If you want something different (Blackwells, 5090) you can reach out using our contact us form!

Burst Pricing depends on what you have in mind. Definitely reach out!

Paul Tseluyko

It seems more about the creatives generation on scale, which part is about apps?

CKMo

@pasha_tseluyko AI App builders can get a dedicated endpoint (if they're really custom-focused) or just the API to unlock all the top models on the market for multi-modal inference.

Long story short? No need to have an amazing AI App concept and then need to also figure out the DevOps of getting your inference set up. It's already done for you, at one of the best price/performance costs.

We've found the right balance where if it's cheaper, quality suffers.

Polman Trudo

You mentioned "5–6× faster inference" in your description, so what is this being compared to?

CKMo

@polman_trudo Other popular inference engines such as Fireworks, Together.ai, and more!

Note that this specifically applies to some of the most commonly used models on our website (which is usually DeepSeek, Qwen, and MiniMax)

Nann

Do you plan to support the latest high-performance GPU types (e.g., NVIDIA H100/A100) to cater to compute-intensive multimodal workloads? Also, will the Inference Engine integrate with popular model hubs (e.g., Hugging Face) to simplify deploying pre-trained multimodal models?

12
Next
Last