How are you building AI that takes actions — not just answers?

Amarsia

•3mo ago

We've been getting the same request over and over from our users: "My AI gives great answers, but it can't actually do anything."

It got us thinking — most AI integrations today are still essentially fancy search boxes. The AI talks, the human acts. But the real unlock is when the AI can close the loop itself — query the database, send the email, update the record — without a human in the middle.

The hard part isn't the action itself. It's the non-determinism. How do you build a system where the AI decides when to act, which action to take, and what parameters to pass — based purely on context — without it going off the rails?

A few things we've learned building this:

Intent detection is the core problem. The AI needs to understand not just what the user said, but what they actually need done. "Check if John is on a paid plan" should trigger a database lookup, not a paragraph explaining what a paid plan is.

Isolation matters more than you think. Each action needs to be stateless and sandboxed. When the AI is calling functions autonomously, you need guaranteed blast radius — one bad call shouldn't affect anything else.

Logging is non-negotiable. Non-deterministic systems are hard to debug. Every action call needs full params + response logged so you can understand exactly what the AI did and why.

Curious how others are thinking about this:

How are you deciding which actions to expose to your AI vs. which ones stay human-controlled?
Are you prompting heavily to guide action selection, or letting the model figure it out?
What's broken for you that current tool-use implementations don't solve?

We just shipped AI Actions on Amarsia with upcoming PH launch — would love to hear how you're approaching this.

44 views

Replies

Best

A few patterns from shipping AI tool use in production:

Read-only vs write splits are the easiest exposure decision. The agent in my product can search the web, surface threads, and generate copy. It can't write to user's project state, mutate channel selections, or hit billing. Anything with persistence consequences is human-triggered. Keeps the blast radius manageable without needing complex permission scopes.
Heavy prompting beats letting the model figure it out, in my experience. Explicit constraint rules in the system prompt: "if user said no video skills, eliminate TikTok/YouTube/Reels." The model still occasionally ignores them. So the rule needs a matching post-generation check that catches the violation programmatically. AI is non-deterministic; the safety net should be deterministic.
What's broken for me with current tool-use: tool calls hanging silently. If the underlying API times out or returns nothing useful, the AI stalls mid-stream with no error surfaced to the user. The Anthropic SDK doesn't give you a clean tool-call timeout primitive out of the box. Have to wrap tool executors in race-with-AbortSignal.

Closest thing I'd want from tool-use libs: a stricter mode that errors hard if the model calls a tool not on the whitelist. Today my whitelist enforcement is at executor level (function not registered = error), but I'd want it earlier in the chain.

Report

3mo ago

@channelscout What’s been tricky for me is deciding how much freedom to give the AI. Too little, and it’s useless. Too much, and it’s risky.

Report

3mo ago

@channelscout @asheer_ahmad when things go wrong, I rely heavily on detailed traces to understand the AI’s reasoning.

Report

3mo ago

This is exactly where I think AI products get interesting. I’m building Traction, and this is the line we keep coming back to: AI that only answers questions is helpful, but AI that understands the business context and can move work forward is where the real value starts. For us, the goal is not just “write me a caption.” It’s more like: look at my content, leads, follow-up, visibility, and revenue, then tell me what needs attention and help me take the next step.

But I completely agree with your point about non-determinism. The scary part is not whether AI can take an action. It’s whether it knows when it should, when it should ask for approval, and when it should stay out of the way. I think the safest path is probably action layers: suggest first, draft second, approve third, automate only after trust is earned.

Report

3mo ago

The framing I keep coming back to is, the question is not which actions to give the AI, it is which actions to permanently take away.

For my product I made one hard architectural decision early: the AI is never allowed to touch the financial math. Hard numbers come directly from structured data feeds. The model only reads the text. It cannot query, compute, or estimate a number. That action is permanently off the table.

The result is that the non-determinism problem you describe is contained. The AI does one thing: extract and cite verbatim sentences from the source document. If it cannot find the source sentence, it drops the claim entirely.

Your point on logging is exactly right. Every output links back to the source document. That is not just a UX feature, it is the audit trail that lets me debug what the model actually did versus what it was supposed to do.

The lesson: the trust ceiling for AI actions rises dramatically when you define the blast radius before you define the capability. Isolation first, then scope creep.

Report

3mo ago