Launching today

Trainer
Train AI agents by recording your screen
69 followers
Train AI agents by recording your screen
69 followers
Trainer lets you train AI agents by simply recording a task once. No prompts, no labeled data. It captures every click, keystroke, and intent as you work, then turns that workflow into a reusable agent that can repeat the process reliably. Built for automating real-world tasks through demonstration instead of manual configuration, making AI agents practical and accessible for everyday work.





Archimyst
@hritvik_gupta1 Congrats on the launch Hritvik. How do you deal with noise/interrupts during a recording (something takes over screen)? What's your recording limit and how do you deal with multi stage processes?
Archimyst
@zolani_matebese There’s no recording limit the longer you record, the more context the AI can use. For multi-stage processes, you can add multiple recordings. Any noise or interruptions are handled by our agents, so they won’t be an issue.
Screen recording is definitely the most intuitive way to train an agent on custom workflows. The real challenge is handling dynamic UI changes when a website updates its layout. How robust is the underlying model at generalizing the task when buttons move?
Archimyst
@rivra_dev Trainer doesn’t rely on fixed coordinates or pixel positions. During recording, it builds a semantic understanding of the UI what the element is (button, input, menu), the surrounding context, labels, and the intent behind the action.
So if a button moves, the layout changes, or spacing shifts, the agent can still find the right element based on meaning rather than position.
Really interesting launch — curious how adaptive the trained agents are in practice. If the original workflow changes a bit, like different input data or an extra confirmation step, can the agent reason through it, or does it need to be retrained from a new recording?
Archimyst
@uceniikot Trainer doesn’t treat your recording as a rigid script. It learns the intent behind each step and builds a semantic map of the workflow: what you’re trying to do, what kind of element you’re interacting with, and the context around it.
how would it deal with a task that has a different UI and flow depending on if its on my phone or on my laptop?
Archimyst
@emily_xu2 Phone and laptop UIs often have different layouts and even slightly different flows, but the goal of the task is still the same. During recording, Trainer captures the semantic intent of each action (what you’re trying to do, what type of element it is, and the context around it), not just where it appears on the screen.
Congrats on the launch! How does it deal with tasks that may slightly vary based on scenarios? Does it need the user to record each scenario or is it enough to include instructions for different scenarios in the voice recording?
Archimyst
@ferdi_sigona This is exactly the kind of real-world complexity we’re designing for.
You don’t need to record every scenario. Trainer works at the level of intent, not rigid steps. During recording, it captures what you’re trying to achieve and the context around each action, so at runtime it can adapt when small variations appear in the flow.
demonstrate once and automate forever sounds great until the app you recorded pushes an update and moves a button 10 pixels to the left. how does it handle that
Archimyst
@tina_chhabra If a UI changes slightly (like a button moving a few pixels, spacing shifting, or minor layout updates), Trainer doesn’t depend on pixel positions, so that kind of change typically doesn’t break anything.
Instead, it finds elements based on semantic signals things like the button’s label, surrounding context, type of action, and what the step means in the workflow.
Archimyst
@tim_350life Good question web apps with complex rendering (dynamic DOM updates, infinite scroll, canvas-based UIs, etc.) are exactly where most automation tools tend to struggle. Trainer doesn’t depend on fixed coordinates or static selectors. During recording, it captures the semantic structure of the UI what each element represents, its surrounding context, and the intent behind the action. At runtime, it re-evaluates the page and resolves elements dynamically instead of replaying rigid steps