How do you tell your coding agent which button you actually mean?
by•
On a dense dashboard with five buttons that look alike, my coding agent reads the screenshot, guesses which one I meant, and edits the wrong one. Then I re-explain, it tries again, another round gone.
What fixed it for me: I stopped sending the raw image. I mark the exact element, write what should change, and hand over a small structured file instead. The agent edits the right thing on the first pass, and the handoff is around 700 tokens instead of several thousand for the picture.
Curious how the makers here deal with the "which element" problem. Prompt harder? Crop tighter? Something smarter?
53 views


Replies
I used to keep drawing arrows on screenshots in Figma. A tiny structured file with element references feels much cleanner and probably easier to version.
SlimSnap
@advin_jadis The versioning angle is the part most people skip. An annotated PNG is a dead end, you can't diff it or reuse the reference. A small structured file you can, which is basically why I ended up building SlimSnap, the arrows became fields. Were you versioning those against design history or the actual code change?
Honestly
I vibe coded my own screenshot tool that goes straight into screenshot markup mode where I can draw boxes and type.
SlimSnap
@kalvinio Ha, same instinct, I built one too (SlimSnap). Since you went further than most: does yours output the boxes and text as structured data the agent reads, or does it still hand over the marked-up image? The jump from "prettier screenshot" to "structured reference the model doesn't have to parse" is the part that actually moved my results.
draw a big red circle around it on paint or GIMP