How do you stay aware of what your AI coding agents are doing?

Pushary

•8d ago

I've been running Claude Code, Cursor, and Codex pretty heavily for the last few months and I keep hitting the same loop:

1. Start a task in one agent

2. Switch to something else (Slack, Twitter, another terminal)

3. Come back 30-40 minutes later

4. Agent finished 35 minutes ago. Or worse, it's been waiting for my approval the entire time.

The more agents I run, the worse it gets. There's no unified way to know what's happening across them.

Curious what other people's setups look like:

- Do you just keep terminals visible and check manually?

- Built any custom notification scripts?

- Use something like ntfy or Pushover?

- Just... accept the wasted time?

I've been building something in this space (push notifications + approval flows for AI agents) and I'm trying to understand if everyone's workflow is as janky as mine, or if some of you have figured out something clever.

Would love to hear what's working and what's not.

1.2K views

Replies

Best

The minimum I’d want is not a fancy dashboard, just a boring handoff trail: current task, last file touched, last command run, whether it is waiting on approval, and what it plans to do next.

For multiple agents, the useful signal is “stuck / waiting / changed code / tests failed / ready for review”. Anything more detailed can usually stay in logs.

Report

1d ago

Pushary

@kevinzrzgg

You've now described the spec three times from three angles and it's converged on the same thing, which is honestly the best validation a spec can get. "Not a fancy dashboard, a boring handoff trail" is exactly right, and I'll say it against my own interest: the dashboard is a trap. The second it has charts and a hero number, it's optimizing to be looked at, when the entire goal is something you almost never look at until it taps you.

Your minimum trail, current task, last file, last command, waiting-on-approval, plans-next, is the actual product, and the multi-agent signal set, stuck / waiting / changed / tests-failed / ready-for-review, is the whole vocabulary. Everything past that stays in the logs where it belongs. The discipline is in what you refuse to surface, not what you add.

So Pushary's job is to be the boring trail, not the fancy dashboard. Six agents, each showing one line of state and a "next," and a ping only when one crosses into a state that needs you. If it ever tempts you to sit and watch it, I've built the wrong thing.

The one field I'd argue is doing the most work is "plans to do next," because that's the cheapest possible preview of a bad decision before it happens, you can kill a wrong turn at the plan instead of the diff. Curious if you actually trust that field, do you read "plans next" and intervene early, or do you ignore it and just judge the result once it's done?

Report

22h ago

This is a real pain point once you run more than one agent at a time. For me the useful alert is not just “the job finished,” but what state it finished in: completed cleanly, waiting for approval, blocked by an error, or made a change that needs review.

The part I’d care about most is the summary quality. A notification that says “done” is less useful than “changed 4 files, tests failed here, needs approval for X.” That turns the alert from noise into a decision point.

Curious if you’re thinking of notifications as the main product, or as the first layer of a bigger agent activity/audit dashboard?

Report

1d ago

Pushary

@grace_lee26

Straight answer to your straight question: notifications are the wedge, not the destination. The ping is just the most acute version of the pain, the thing that gets someone to install it on a Tuesday. Underneath it is the real product, an agent activity and audit layer: what changed, what was reviewed, where a human approved, across whatever tools you're running. The notification is how it earns attention. The trail is why it stays useful.

One caveat I'd add, because smart people in this same thread pushed me on it: "dashboard" is a slightly dangerous word. The audit layer has to stay a boring handoff trail, not a charts-and-vanity-metrics thing you sit and stare at. The goal is something you almost never open until it taps you, and when you do, it hands you the decision instantly. Audit trail, yes. Dashboard you babysit, no.

And you put your finger on the actual moat: summary quality. "Done" is noise. "Changed 4 files, tests failed here, needs approval for X" is a decision point, and that gap is the entire product. Anyone can fire a notification. Making the payload good enough that you can rule on it without opening the terminal is the hard, valuable part, and it's exactly where most "we added notifications" features stop.

Curious which half of a good summary you'd trust an agent to get right, the factual part (files, tests, commands) is mechanical, but "needs approval for X" requires the agent to correctly know X is risky. Do you trust it to flag the right things, or would you rather define the risky paths yourself up front?

Report

22h ago

Honestly, I think this is becoming a real problem as people run multiple AI agents in parallel. Keeping terminals visible works for one agent, but it breaks down quickly when you're juggling Claude Code, Cursor, Codex, and other tools.

The most effective setups I've seen use notifications (desktop, mobile, Slack, Discord, ntfy, Pushover, etc.) for three events:

Task completed
Agent needs approval/input
Agent encountered an error

Without that, you end up wasting time either waiting on agents or having agents wait on you.

What seems to be missing is a unified "agent inbox" that shows the status of all running agents in one place. If you're building push notifications and approval flows, you're solving a real pain point. The challenge will be making it work seamlessly across different agent ecosystems rather than becoming yet another dashboard people forget to check.

Report

20h ago

Pushary

@audi_sport_cars

"Agent inbox" is the right name, and you also wrote the warning that should be tattooed on the project: don't become another dashboard people forget to check. That's the failure mode that kills this entire category. The moment your tool requires the user to remember to look at it, you've just rebuilt the terminal grid with extra steps. A dashboard you have to check is a dashboard that has already lost.

The escape from that is making it push-first, not pull-first. You never check it. It checks you. The three events you listed, completed, needs-input, error, are exactly the only reasons it should ever interrupt, and the rest of the time it stays dark. The inbox isn't a place you visit, it's a thing that taps your shoulder and is otherwise silent. Get that polarity right and "forgot to check it" stops being possible, because checking was never the job.

The other half, seamless across ecosystems, is the genuinely hard engineering and I won't pretend otherwise. Per-tool notifications already exist, Claude Code pings, Warp pings, the problem is they each ping in their own silo with their own definition of done. The whole value is one inbox speaking one language across Claude Code, Cursor, Codex, and whatever's next, so you're not maintaining three notification systems and a script graveyard. That unification is the moat, and it's also the part that's hard enough to be worth doing.

Curious where you'd want it to draw the line on errors specifically, every error, or only the ones it can't self-recover from? An agent that retries and fixes its own typo shouldn't page you, but I keep wrestling with where "it'll handle it" ends and "you need to know" begins.

Report

19h ago

My setup ended up being pretty janky too — tmux splits with a watch command tailing logs on each agent session, plus a few custom aliases to grep for 'waiting for input' strings. It works but it's manual polling dressed up as automation.

The part I haven't solved is knowing which agent finished *successfully* vs which one quietly stopped because it hit an ambiguous decision and didn't know what to do. Those look identical from the terminal. You come back, nothing's running, and you have to read through the whole context to figure out why.

The visibility gap feels especially real when you're learning AI coding — you're still building intuition for what the agent will or won't surface on its own.

Report

17h ago

Pushary

@grace_lee26

"Manual polling dressed up as automation" is the most honest sentence in this entire thread. The grep-for-'waiting-for-input' aliases are clever and also a confession: you're pattern-matching on the agent's prose, which means the day it phrases a question slightly differently, your automation goes blind and you don't even know it went blind.

But you've put your finger on the genuinely hard one, and it's deeper than it looks. Finished-clean and quietly-stopped-on-ambiguity look identical from the terminal because "stopped" isn't a state the agent declared, it's the absence of output, and you cannot grep for an absence. A silent terminal is the same whether the agent succeeded, got confused, or crashed. That's the root bug: terminal-watching infers state from activity, and the most important state, "I hit a decision I can't make and gave up," produces no activity at all.

The only real fix is flipping it: the agent has to announce its terminal state explicitly, "done, here's the diff" versus "blocked, here's the ambiguous call I couldn't resolve," so you're reading a declaration instead of interpreting a silence. That's exactly the gap Pushary goes after, make the agent say which of those two it is, so the two stop being indistinguishable and you never again come back to a dead terminal and have to autopsy the whole context to learn why it died.

The learning-AI-coding angle is the part I find most motivating, honestly. When you're still building intuition for what an agent will and won't surface, a tool that consistently shows you "here's what it flagged, here's what it silently decided" is also teaching you the agent's blind spots. You stop guessing what it'll handle and start knowing. Curious, in your tmux setup, roughly how often is a stopped agent actually blocked versus actually done, do the silent-ambiguous stops happen enough to be the main tax, or is it the rarer-but-brutal case?

Report

17h ago

Pushary

Launch is live guys - would love some support https://www.producthunt.com/posts/pushary-3

Report

7d ago

The "waiting for approval" loop is the absolute silent killer of AI productivity! 🛑 Right now, it's mostly manual terminal-watching or messy custom scripts. A unified notification layer for agent execution is a brilliant move—definitely a tool the space needs.

Report

6d ago

Pushary

@veer_singh14

"Silent killer" is the perfect phrase for it 🎯 — the agent finishes its actual work in 5 minutes and then just... sits there waiting, and you don't find out for half an hour. The work was fast; the waiting on you to notice is what's slow.

Appreciate the read on it being a unified layer rather than another one-off script. That's the bet exactly — everyone's already solved this badly with their own custom ntfy/Pushover hacks, but nobody wants to maintain that glue across Claude Code, Cursor, and Codex. Are you currently running any custom notification scripts yourself, or still in the manual terminal-watching phase? Trying to gauge how many people have already duct-taped a solution vs. just living with it.

Report

6d ago

@aadilghani Couldn't agree more—maintaining that custom glue code across different AI tools is a massive hidden time-sink for engineering teams.

We've built some of our own notification workarounds for our dev workflows to dodge the terminal-watching trap, so I completely validate the pain point you're solving here. Centralizing this into a single, reliable layer is a game-changer for team efficiency. Really looking forward to seeing how Pushary handles the multi-tool ecosystem!

Report

6d ago

This is a real problem, especially once you’re running more than one coding agent at the same time.

What has helped me is treating AI coding agents less like autocomplete and more like junior engineers that need task boundaries, check-ins, and review points.

My workflow is usually:

I break tasks into very small tickets before sending anything to Claude Code/Cursor/Codex.
I define the expected output upfront: files to touch, files not to touch, acceptance criteria, and what “done” means.
For any critical change, I ask the agent to update the project .md file with what changed, why it changed, files touched, assumptions made, and any follow-up risks.
I ask for a summary before implementation when the task is risky.
I review diffs before accepting changes.
For longer tasks, I use checkpoints: “stop after planning,” “stop after backend changes,” “stop before modifying auth/payment/database logic.”

That .md file has been surprisingly important because it becomes a running memory and audit trail for the project. When something breaks later, I’m not trying to reverse-engineer what the agent did from vibes.

For me, the bigger issue is not just notification visibility. It’s workflow observability.

I don’t only want to know “the agent is done.” I want to know what changed, why it changed, what assumptions it made, what files were touched, and what needs human approval before moving forward.

That’s where I think the real opportunity is: status, approvals, logs, summaries, risk flags, and handoff points across agentic coding tools.

Because once you’re building production software, the real question becomes: how do I stay in control while AI moves faster than I can manually monitor?

Report

5d ago

Pushary

@toch_aria

You basically wrote my product spec, so I'm either flattered or out of a job.

You're right: "done" is the shallow version. What changed, why, what it touched, what needs sign-off, that's the actual signal. The notification is just the courier. The .md-as-audit-trail move is the smart part, because future-you debugging at 2am does not accept "vibes" as a commit message.

That control-while-it-moves-faster-than-you-can-watch problem is the whole reason pushary.com exists. Think of it as a control panel for your agents: status, approvals, and risk flags across Claude Code, Cursor, and Codex, so you stay in the loop without babysitting six terminals.

Quick one back at you: do you prompt for that .md update every time, or have you wired it into a skill so it's automatic? That reliability is load-bearing for the whole thing.

Report

5d ago

@aadilghani Glad to hear that, I will be reviewing the product link. Initially I do prompt for that everytime. Which can be a token killer. I switched to automating it on skill

Report

4d ago

There's hundreds way to handle this I believe.
In my case, the most efficient way was letting my agent access to my Slack and format the AI to let know any of the team member related to the question so they can answer in slack and the agent can read the response.
You can also let the agent to wait X minutes and check back if got any answer.
That has been quite smooth with my co founder. They went to sleep, the agent was running/building stuff, when it got stuck, the agent just ping me on Slack, I gave him the solution or fix the problem it was facing, then it get back to work confirmed me and back and fourth this way.

Of course, there's still the first step of planning the work which need attention and you can't really get off the computer. But when it's building, anyone from a company should be able to help it make the works done.

Report

4d ago

Pushary

@florent_duthoit

This is genuinely one of the better setups in the thread, and the insight buried in it is the real gold: "anyone from a company should be able to help it get the work done." That's the part most people miss. Once the agent can route a question to whoever can actually answer it, you've turned a solo bottleneck into a team that happens to be asleep half the time.

The Slack relay is smart, and it's basically Pushary's thesis built by hand. The two places it gets fragile: you had to wire it up per agent, give it Slack access, and format the prompts just so, and the "wait X minutes and check back" loop is polling sneaking back in through the side door. Works great with a cofounder who knows the dance. Gets messy across six agents and a bigger team where you also want to know what it changed, not just answer its question.

Pushary is that pattern as a product instead of a custom integration: structured handoffs with risk flags routed to the right person, no per-agent plumbing, and the relay handles the wait so the agent isn't burning cycles checking back.

You nailed the one honest limitation too. Planning still chains you to the desk. Agreed, and I think that's correct for now. You should be in the room for the plan. It's the babysitting after the plan that shouldn't require a human at all. Curious, does your agent ever ping the wrong person, or have you got the routing dialed in?

Report

4d ago

@aadilghani You're right about the two fragile places but we've easy solution to counter that.
Let me give you more context about our team workflow.

We train each of our team member to Opencode from sales, customer support, delivery, developper, finance, what ever. We've 30+ MCP connected (built-in MCP server that is used as a gateway) that our team can use in Opencode accross all the company services and connect their own company account to it.
It has been the best productivity tool we found and by far the most performing one.
Easy script setup that can run on any device and configure everything properly without any technical knowledge required. Just a one line command to run (could become an executable).

So to answer the two fragile places:
- No need to format the prompt, global instructions are setup in a way that it know what to do.
- The time it wait, we don't really care and can be incremental. At the end, we just want the work to be done what ever time it takes. If it waited too long, it will ping the concerned person again to follow up. I wish there's some way to wake up an agent on specific event (slack messages received and so on, that could be the perfect improvement/solution long term).

For the wrong person, we of course use agent memory. It is not yet shared accross the team, but we may have ASAP. Right now, everyone have his own personal memory and the Agent can remember who to talk with according to the task we're dealing with at the moment.

I think the problem most people face is big at the beginning and they all need to find a way it match with the team mindset.
We are evolving our ecosystem with our discovery and rely more and more with agents, with this setup we have already automated most of it:
- Development
- New workflows
- Create/Update documentation
- & much more.

The most underrated winning MCP has been Playwright MCP. There's no more limits with this one and way more efficient than any built-in Claude/GPT/Perplexity browsing agent system.

I feel your platform is interesting, but if I have to keep being on my phone confirming the notification, this will drive me crazy.
We need to provide the agent the capability of having a dedicated environment for it to fail, so the basic question/authorization is nbot required anymore.

Report

4d ago

Pushary

@florent_duthoit

You just argued me into a better version of my own product, so thank you for that.

You're right, and I'll go further than you expect: if Pushary is pinging your phone for "can I create a file," it has already failed. That class of approval shouldn't exist. A sandboxed environment where the agent is free to fail and self-correct kills 90% of interruptions, and I'm fully on board with that being the goal. Anyone building approval flows for reversible actions is just adding friction with extra steps.

But there's a second class that no sandbox removes, because it's not "did it fail," it's "which way did you want this." Touch the payment logic or not. Ship to prod now or wait. The spec was ambiguous and there are two reasonable interpretations. In those, the human isn't a safety net, the human is the missing input. The agent can't fail its way to your intent. That irreducible slice is the only thing Pushary should ever surface, and if it surfaces more than that, I'm doing it wrong.

And here's the fun part: you already designed the rest of it. "I wish there was some way to wake up an agent on a specific event, Slack message received" is exactly the return path Pushary is built around. The agent sleeps, the event fires, it wakes and keeps going. You're not arguing against the platform, you're spec'ing its core loop. We just disagree on whether you want to be the device that confirms, and I think the answer is "only for the handful that actually need your brain."

Separately, your Opencode plus 30+ MCP gateway setup with non-technical teammates running one-line installs is one of the more impressive org-wide agent rollouts I've heard described, and the shared-memory routing is the obvious next unlock. Big co-sign on Playwright MCP too, the built-in browsing agents aren't close. Quick one: when the agent hits a genuine judgment call, not a failure, who in that 30-person setup ends up being the input, and how does it know?

Report

4d ago

@aadilghani we believe more in agent than god itself 😂

Mostly when the agent designed everything, human mostly guide it the wrong way.
If the agent came with a blocker, then we just need to let him investigate and came with choices that match what we want. The human is just here to give insights or information he doesn't have, but never to make a decision.

I remember Jensen saying: "I'm not asking it to think for me. I'm asking it to teach me things that I don't know."

This happen during the plan mode, that's why the focus and waiting (reading actually) during this phase is important, you learn every minute.
We know what we want but we don't know how to achieve it in a proper & quickest way.

Report

4d ago

The notification piece you're describing solves half of it — knowing when the agent finished. The half that kept biting us was knowing whether what it did was any good, and being able to see it later. The agent runs, produces output, and the context evaporates — you can't see what it knew going in or whether the result held up.

We ended up wiring memory + execution + review into one layer so the answer doesn't disappear when the terminal closes. The thing that finally made it legible was cost-per-approved-output instead of raw token spend — turned out a handful of people were producing most of what actually shipped, and you can't see that from notifications alone.

How are you thinking about the "was it good" layer in Pushary, or are you keeping it tightly scoped to the notify-on-finish problem?

Report

13h ago

Completely relate to the "half your brain monitoring" thing. You think you're doing deep work but part of your attention is always reserved for checking the terminal.

Report

13h ago

•••

3 4 5