SlimSnap

Your AI doesn't know which button you mean

483 followers

Your AI doesn't know which button you mean

483 followers

Visit website

Screenshots and screen recording apps

The AI reads your screenshot as a pixel blob and guesses which button you meant. SlimSnap converts the screenshot plus your annotation into structured JSON: every element has coordinates, an ID, and your arrow points at a specific one. Around 700 tokens vs 1,568 raw on Sonnet. Free Mac app. Schema and Claude Code skill are open MIT. Runs entirely on-device.

Free

Launch tags:Design Tools•Productivity•Artificial Intelligence

Launch Team / Built With

Battery AutopilotPut your MacBook battery on AI autopilot

Promoted

SlimSnap

Maker

📌

The day I shipped this started with me yelling at Claude Code for the fifth time. I'd pasted a screenshot of a misaligned form. I'd typed "fix this." Claude moved the wrong input. I retyped. Claude moved a different wrong input. I gave up and fixed it manually. The reason it kept guessing: it was reading raw pixels. It had no way to know which rectangle was the input I meant, so it picked one that looked plausible. SlimSnap converts the screenshot into a spec the AI can parse element by element. Each element has coordinates, OCR text, color values, and (if you drew an arrow on it) a target reference saying "this one." It also happens to be ~700 tokens versus the 1,568 raw screenshots cost on Sonnet (up to 4,784 on Opus 4.7+). That part is just bonus. Open: the JSON schema (MIT, github.com/bickov/slimsnap-schema) and a Claude Code skill that auto-loads your latest capture (MIT, github.com/bickov/slimsnap-skill). The Mac app is closed but free. Other tools (Cursor, Lovable, bolt.new, Replit, ChatGPT Vision): the spec works, but you paste the JSON into chat yourself. Cleaner than raw images. Not as smooth as the Claude Code auto-loader. Someone with time on their hands could write the equivalent skill for any of them. A real question: which AI tool do you reach for most when you need to point at something specific on screen? Tells me where to build the next auto-loader.

Report

2mo ago

This is a real pain with Claude Code and Cursor. The agent usually understands the general UI, but still touches the wrong element. Does SlimSnap keep enough context when there are multiple similar buttons or inputs on the same screen?

Report

2mo ago

SlimSnap

Maker

@farrukh_butt1 Yes, exactly the case the schema was built for. Each element gets a unique ID regardless of how visually similar it is to others. OCR text + bbox coordinates + (if present) parent context disambiguate the duplicates. So if there are five "Submit" buttons on the screen, they show up as e_button_5, e_button_8, e_button_11 (or whatever IDs they get), and your arrow annotation points at exactly one of them.

The edge case where it still struggles: identical floating elements with no surrounding container or distinguishing text (rare but possible in canvas-based apps). For 95% of UI work, the ID + bbox + annotation combo holds up.

What kind of UI are you hitting this with most? Cursor with React forms? Claude Code with admin dashboards? Useful for prioritizing where to harden the schema.

Report

2mo ago

Walk me through what happens if I take 5 screenshots in a session. Does the skill grab all of them, or just the most recent one?

Report

2mo ago

SlimSnap

Maker

@dhirendra_singh10 Good question. By default it grabs the most recent one. Each screenshot is saved to your SlimSnap folder, and the skill reads the latest capture on its own. So in a 5-shot session it uses the newest. The other four stay in the folder, nothing is lost.

The case you're hinting at is a good one though. Mark up several app screens, then tell the agent to fix all of them in one pass. That batch flow is worth supporting, putting it on my list.

Report

1mo ago

Would love to see a Windows version!

Report

2mo ago

SlimSnap

Maker

@umberto_abbatantuono Hearing this a lot today. Windows port isn't in the short-term roadmap (OCR layer is Mac-native, needs a different pipeline), but if there's enough signal it moves up the list. If anyone else here is on Windows and would actually use this, reply to this comment or email hi@slimsnap.ai. That's how I'll prioritize.

Report

2mo ago

SlimSnap

Maker

One follow-up question for anyone scrolling: when you paste a screenshot into your AI tool (ChatGPT, Claude, Cursor, Lovable, whatever), what's the #1 thing the AI gets wrong about it? Trying to figure out which gap to close next.

Report

2mo ago

@bickov I tend to find that sometimes it wants to change too much and then I have to backtrack. Modifying other elements or changing the layout of the thing I’m talking about are what I find the most annoying.

Report

2mo ago

SlimSnap

Maker

@montverde That's the exact failure mode the target_ref field tries to address. When you annotate the misaligned button and the agent sees annotation.target_ref = e_button_3, it has a stronger anchor for what to touch and what to leave alone. Doesn't eliminate scope creep entirely (the agent still decides whether layout shifts are necessary), but it shifts the default from "rewrite the whole component" toward "fix the specific element referenced."

The backtracking compounds in longer sessions. Which AI tool is this happening most for you? Different agents handle scope differently and that helps me figure out which auto-loader to build next.

Report

2mo ago

@bickov I think that’s super helpful, definitely a time saver. For me, OpenAI was worse for unwanted changes. I use Claude the majority of the time and it still happens but not to the same degree.

Report

2mo ago

SlimSnap

Maker

@montverde Yeah, that matches what I've seen. Claude tends to respect the "change only this" intent better than GPT does, even before SlimSnap. With the Claude Code skill the loop gets tighter still: it auto-loads the latest capture so you don't even paste the JSON, just type "fix what I marked" and the agent reads the spec.

Curious if you're on Claude Code specifically or claude.ai / API. If it's Claude Code, the skill is at github.com/bickov/slimsnap-skill, MIT, install instructions in the README.

Report

2mo ago

@bickov That sounds like it works a lot better then. I use Claude.ai and Cursor mostly, I prefer it over Claude code.

Report

2mo ago

SlimSnap

Maker

@montverde The auto-loader skill is Claude Code only right now. For Cursor or claude.ai it's a manual paste step. SlimSnap exports the JSON, you drop it into chat with your prompt. Element refs still work, the agent just doesn't auto-grab the latest capture for you.

Cursor-native skill is on the wishlist if demand shows up. What makes Cursor + Claude.ai your default over Claude Code? That answer shapes which auto-loader I build next.

Report

2mo ago

@bickov Got it, thanks!

Report

1mo ago

The underlying problem is real, Claude guessing the wrong element from a raw screenshot is a genuine frustration. But the demo might be selling it short: changing a button color is exactly the case where anyone would just open DevTools. The pitch lands harder on complex layouts with 40 overlapping components where "the second input in the third card" means nothing to a pixel reader. Would love to see a demo on a gnarly real-world UI rather than a clean form :)

Report

1mo ago

SlimSnap

Maker

@keirodev Yeah fair. The form demo is way too clean. Anyone'd just open DevTools for that. Real wedge is exactly your example: 40 overlapping components where "second input in the third card" is the only useful way to point at it. Picked the form because it fits in one screenshot. Wrong asset for selling the real case.

Redoing the demo on something messier is on the list. If you've got a real dashboard you'd want me to throw it at, send a screenshot and I'll post what the JSON comes out as.

Report

1mo ago

Forum Threads

p/slimsnap

•

16d ago

Agents hand you five working versions. How do you pick the one that ships?

Any coding agent gives you a working version of a feature in minutes. Ask again and you get another one, also working, slightly different. Working stopped being the filter.

My rule is simple: I ship the version with the fewest moving parts, because I'm the one debugging it at 2am. Speed of writing means nothing against speed of fixing.

What's your filter? Curious if anyone has a better rule than simplest one wins

p/slimsnap

•

23d ago

Founders, be honest: are you building an idea AI gave you?

No judgment, genuinely curious. How many of us asked ChatGPT for app ideas at some point, then ended up building one of them, or half-building it? The line between "my idea" and "the model's idea" got blurry this year. Where did your current project actually come from?

p/slimsnap

•

5d ago

AI can write my entire app but it can't get Microsoft to approve it

My Windows build has been in Microsoft's certification queue for three weeks. Three new models shipped in that time, two of them can supposedly do my whole job, and not one of them can get a human at Microsoft to press approve.

What's the longest you've been stuck on something you couldn't code your way out of?

View all