Building in public multi-platform desktop app using Claude Code

I am building Navam Sentinel in public as a reference AI project source available at official GitHub repo. The problem I am addressing is multi-agent regression testing for quality, capabilities, efficiency, and other criteria which matters. I want to do so with least cognitive load on the end user who is a busy developer, engineer, scientist in an AI lab. The project is a reference in two ways, 1) how to build a multi-agent AI system solving for AI-ops automation using visual primitives, 2) how to context engineer AI code generation for a complex multi-platform desktop AI app across tens of thousands of lines of code, hundreds of tests, multiple releases per day.

I am sharing this thread to both share my learning as well as receive feedback on the product, features, development approach from the awesome PH community. Here are few open questions at this stage in the project:

What are the key challenges that AI labs or teams face when building complex multi-agent AI systems?
How do experienced or new AI teams handle agent testing and regressions? Is there a difference in approach, workflow, or tooling?
What is the right UI for multi-agent testing - code first, visual first, hybrid, or something else?

10 views

Building in public multi-platform desktop app using Claude Code

Replies