How do you tell your coding agent which button you actually mean?

SlimSnap

•8h ago

On a dense dashboard with five buttons that look alike, my coding agent reads the screenshot, guesses which one I meant, and edits the wrong one. Then I re-explain, it tries again, another round gone.

What fixed it for me: I stopped sending the raw image. I mark the exact element, write what should change, and hand over a small structured file instead. The agent edits the right thing on the first pass, and the handoff is around 700 tokens instead of several thousand for the picture.

Curious how the makers here deal with the "which element" problem. Prompt harder? Crop tighter? Something smarter?

53 views

Replies

Best

I used to keep drawing arrows on screenshots in Figma. A tiny structured file with element references feels much cleanner and probably easier to version.

Report

6h ago

SlimSnap

@advin_jadis The versioning angle is the part most people skip. An annotated PNG is a dead end, you can't diff it or reuse the reference. A small structured file you can, which is basically why I ended up building SlimSnap, the arrows became fields. Were you versioning those against design history or the actual code change?

Report

5h ago

Honestly

I vibe coded my own screenshot tool that goes straight into screenshot markup mode where I can draw boxes and type.

Report

5h ago

SlimSnap

@kalvinio Ha, same instinct, I built one too (SlimSnap). Since you went further than most: does yours output the boxes and text as structured data the agent reads, or does it still hand over the marked-up image? The jump from "prettier screenshot" to "structured reference the model doesn't have to parse" is the part that actually moved my results.

Report

5h ago

draw a big red circle around it on paint or GIMP

Report

3h ago