Nexa SDK

Run, build & ship local AI in minutes

4.9•7 reviews•

849 followers

Run, build & ship local AI in minutes

4.9•7 reviews•

849 followers

Visit website

Nexa SDK runs any model on any device, across any backend locally—text, vision, audio, speech, or image generation—on NPU, GPU, or CPU. It supports Qualcomm, Intel, AMD and Apple NPUs, GGUF, Apple MLX, and the latest SOTA models (Gemma3n, PaddleOCR).

Free

Launch tags:Open Source•Artificial Intelligence•GitHub

Launch Team / Built With

Framer — Launch websites with enterprise needs at startup speeds.

Launch websites with enterprise needs at startup speeds.

Promoted

Octoverse

Maker

📌

Hello Product Hunters! 👋

I’m Alex, CEO and founder of NEXA AI, and I’m excited to share Nexa SDK: The easiest On-Device AI Toolkit for Developers to run AI models on CPU, GPU and NPU

At NEXA AI, we’ve always believed AI should be fast, private, and available anywhere — not locked to the cloud. But developers today face cloud latency, rising costs, and privacy concerns. That inspired us to build Nexa SDK, a developer-first toolkit for running multimodal AI fully on-device.

🚨 The Problem We're Solving

Developers today are stuck with a painful choice:

- Cloud APIs: Expensive, slow (200-500ms latency), and leak your sensitive data

- On-device solutions: Complex setup, limited hardware support, fragmented tooling

- Privacy concerns: Your users' data traveling to third-party servers

💡 How We Solve It

With Nexa SDK, you can:

- Run models like LLaMA, Qwen, Gemma, Parakeet, Stable Diffusion locally

- Get acceleration across CPU, GPU (CUDA, Metal, Vulkan), and NPU (Qualcomm, Apple, Intel)

- Build multimodal (text, vision, audio) apps in minutes

- Use an OpenAI-compatible API for seamless integration

- Choose from flexible formats: GGUF, MLX

📈 Our GitHub community has already grown to 4.9k+ stars, with developers building assistants, ASR/TTS pipelines, and vision-language tools. Now we’re opening it up to the wider Product Hunt community.

Best,

Alex

Report

7mo ago

@alexchen4ai Super exciting launch! 🚀 On-device AI that’s fast and private is exactly what a lot of devs have been waiting for. Love that you’re making it easier to tap into GPU/NPU acceleration without the usual complexity. Congrats on bringing this to the PH community!

Report

7mo ago

NexaSDK for Mobile

Maker

@alexchen4ai @lluisrovirale Thank you for your warm words, we are working on more features for developers, our next steps include MCP client support, AMD NPU and more

Report

7mo ago

NexaSDK for Mobile

Maker

Our goal is to make on-device AI friction free!

Report

7mo ago

@alexchen4ai This is really exciting, love the launch! Congrats to you and your team.

I think our subscribers would be super excited to hear more about this. Not sure you're familiar with TLDR, but we have an audience of 6M+, highly engaged tech professionals, developers and enterprise decision-makers (41–48% open rates).

Would love to chat more if you're interested! Congrats again

Report

7mo ago

remio - Your Personal ChatGPT

@alexchen4ai Congratulations on your launch! It’s impressive how you’ve made on-device AI more accessible and efficient across multiple hardware types. What do you see as the biggest advantage of Nexa SDK compared to other on-device AI toolkits?🤔

Report

7mo ago

@alexchen4ai Impressive team! Impressive work!

Report

7mo ago

Mom Clock

Congrats on the launch, Zack and Alex!

Just wondering, how does Nexa handle memory management when running large models like LLaMA or Stable Diffusion on local devices?

Report

7mo ago

Octoverse

Maker

@justin2025 Hi Justin, thanks! We have many quantization options inside nexasdk. For example, with larger model, you can use more aggressive quantization such as 4bit or 2bit. In that case, the model can be fitted into your machine. We also have recommendation for users so that they can easily find the appropriate model to run on-device.

Report

7mo ago

Congrats on the launch, Alex! Love how you’re making on-device AI actually practical — the latency + privacy trade-off with cloud APIs is a real pain point.

The OpenAI-compatible API is a smart move too, since it lowers the switching cost for developers. Curious — have you seen more traction so far with folks building assistants, or with multimodal apps (like ASR/TTS and vision)?

Excited to see how Nexa SDK evolves!

Report

7mo ago

NexaSDK for Mobile

Maker

@trgiangpham Your support means a lot to us. Yes, indeed, ASR/TTS and CV models have faster and more adoption especially in IoT devices.

Report

7mo ago

NexaSDK for Mobile

Maker

@trgiangpham Thank you! Yes, multimodal AI is in high demand right now. ASR and Vision all capture richer context for the AI assistant to understand you more!

Report

7mo ago

This is great! We’ll be using it for CoreViz!

Report

7mo ago

NexaSDK for Mobile

Maker

@wassgha Huge thanks, this means a lot for us! We would like to provide more engineering support, would you please send me an email zack@nexa4ai.com then we will closely work with you?

Report

7mo ago

NexaSDK for Mobile

Maker

@wassgha Awesome! Please let us know if you have any feedback!

Report

7mo ago

This is pretty cool. What is the largest model that is supported, and how do you work around the different memory and compute available for different devices?

Report

7mo ago

NexaSDK for Mobile

Maker

@tarun_pasumarthi The largest model depends on your device RAM. Usually laptops are less than 64GB, and it works with SDXL, SD3.5 image generation models

Report

7mo ago

NexaSDK for Mobile

Maker

@tarun_pasumarthi It can support any model as long as you have enough RAM

Report

7mo ago

Congrats on the launch, Zack and Alex! 🎉

Quick question — is the Nexa SDK able to integrate with WebGPU for browser-based apps?”

Report

7mo ago

Nexa SDK

Maker

Greetings Product Hunters!

I’m Zack, CTO and co-founder of Nexa AI. I’m thrilled to share Nexa SDK — our on-device AI development toolkit designed for builders who want speed, privacy, and control.

🛠️ Our Technical Solution

- Unified runtime: CPU, GPU (CUDA, Metal, Vulkan), and NPU (Qualcomm, Apple, Intel)

- Multimodal support: text, vision, and audio (LLM, ASR, TTS, VLM)

- OpenAI-compatible API with JSON schema function calling & streaming

- Flexible model formats: GGUF, MLX, .nexa

- 5k+ GitHub stars and growing developer adoption

📌 What’s Next on Our Roadmap

1. Day-0 model support - Latest multimodal models available immediately

2. Expanded backend support - AMD NPU, Intel NPU multimodality, and more

3. Mobile compatibility - Native iOS and Android SDKs

We’ll be online all day — looking forward to your questions, feedback, and ideas!

👉 Try it now at https://github.com/NexaAI/nexa-sdk

Warm regards,

Zack

Report

7mo ago

1 2 3

•••

4.9

Based on 7 reviews

Review Nexa SDK?

Reviews praise Nexa SDK for fast local setup, smooth “build & ship” flow, and strong hardware flexibility across CPU/GPU/NPU with Apple and Qualcomm support. Users highlight privacy, low latency, and reliable performance for text, vision, audio, and image tasks, plus broad model format compatibility (GGUF, MLX, Gemma3n, PaddleOCR). Notably, the makers of NexaAI emphasize unifying fragmented backends and future-proofing across devices. Feedback notes excellent docs, minimal configuration, and consistent performance from prototyping to production, making it a dependable choice for on‑device AI.

Summarized with AI

Pros

Cons

Reviews

Most Informative

Hello Product Hunters! 👋

I’m Alex, CEO and founder of NEXA AI, and I’m excited to share Nexa SDK: The easiest On-Device AI Toolkit for Developers to run AI models on CPU, GPU and NPU

🚨 The Problem We're Solving

Developers today are stuck with a painful choice:

- Cloud APIs: Expensive, slow (200-500ms latency), and leak your sensitive data

- On-device solutions: Complex setup, limited hardware support, fragmented tooling

- Privacy concerns: Your users' data traveling to third-party servers

💡 How We Solve It

With Nexa SDK, you can:

- Run models like LLaMA, Qwen, Gemma, Parakeet, Stable Diffusion locally

- Get acceleration across CPU, GPU (CUDA, Metal, Vulkan), and NPU (Qualcomm, Apple, Intel)

- Build multimodal (text, vision, audio) apps in minutes

- Use an OpenAI-compatible API for seamless integration

- Choose from flexible formats: GGUF, MLX

Best,

Alex

Nexa SDK

Run, build & ship local AI in minutes

Run, build & ship local AI in minutes

What's great

What's great

What's great

What's great

What's great

What's great