
Tessl
Optimize agents skills, ship 3× better code.
330 followers
Optimize agents skills, ship 3× better code.
330 followers
Tessl helps developers evaluate and optimize agent skills, so you focus on building with smarter AI agents instead of fixing bugs and hallucinations - no signup required ➡️ tessl.io/registry/skills/submit






Tessl
Hey Product Hunt! 👋
Guypo here, founder of Tessl (previously founded Snyk).
Today, I’m excited to announce that you can evaluate your skills and optimize them on Tessl. This means you can stop debugging agent output and start shipping quality code, faster: https://tessl.io/registry/skills/submit
Agent skills help agents use your products, build in your codebase and enforce your policies.
They're the new unit of software for devs - but most are still treated like simple Markdown files copied between repos with no versioning, no quality signal, no updates.
Without AI evaluations, you can’t tell if a skill helps, provides minimal uplift or even degrades functionality. You spend your time course-correcting agents instead of shipping.
Tessl is a development platform and package manager for agent skills. With Tessl, we were able to evaluate and optimize ElevenLabs' skills, 2x'ing their agent success in using their APIs.
If you are building a personal project, maintaining an OSS library, or developing with AI at work, you can now evaluate your skill and optimize it to help agents use it properly.
What skills are you working on, and what's your use case for them?
I'm an absolute fan of @guypod and the @Tessl team.
They're pioneers in the AI industry, and active contributors by maintaining AINativeDev and organizing the AI Native DevCon. So, when the team reached out for this launch, I was super pumped.
@Tessl is a package manager for agent skills. It helps you find, install, and evaluate capabilities for your coding agents. It's the right direction. In a recent thread, [1] we discussed best practices to get the most out of @Claude Code. Above all? Run more agents in parallel. @Tessl teaches them coding best practices, raising the quality of the outputs.
The timing is perfect.
Go to tessl.io/registry/skills/submit and start shipping better, secure code at scale.
S/O to @guypod and team, keep up the inspiring work 👏👏
[1]: How many Claude Codes do you run in parallel?
Tessl
@fmerian incredible writeup - thank you for hunting us and for framing it so well.
Parallel agents point is great angle - running multiple claude code instances is becoming the norm for serious teams, but the quality bottleneck shifts fast when you scale agents horizontally.
That's exactly where skills and evals become essential - 1 poorly written skill degrades output across every parallel session. With evals and optimizations, folks can focus on saving serious time debugging bugs/hallucinations/API misuses, and shipping quality code.
Appreciate the support from day one! 🧡
Raycast
Had a great time speaking at Tessl's DevCon last fall. Their approach to agents and skills is super compelling, and their commitment to open source is as well.
I'm definitely going to incorporate this into my rapidly evolving agentic development process!
Tessl
@chrismessina Appreciate your comment Chris! It was great having you at DevCon ⭐️
Hi @chrismessina, what happened to the JailBreakChat website? Why did it shut down?
The eval-driven approach makes sense. Most teams copy skill files across projects and hope they still work after a model update - there's no feedback loop telling you the context degraded. Having structured evals that catch regression before it hits production is the missing piece.
Curious about the version compatibility matrix. When a new model version drops (say Claude Opus to Sonnet), how granular is the eval detection? Does it flag per-skill degradation or just overall task completion changes? The 1.8-2X performance numbers are compelling but I'd want to know which skills contributed most vs which ones were noise.
Tessl
@zzunkie Excellent question. Whenever a new model drops, we rerun our skill evaluations. That lets us flag per-skill regressions across every scenario. As you can see below, we can clearly measure the uplift - or lack of it - from adding extra context, based on task evaluations for the content-strategy skill (https://tessl.io/registry/skills/github/coreyhaines31/marketingskills/content-strategy/evals). It’s also useful when a skill doesn’t help much: users can see they’re better off running without it for this particular task.
@baptiste_fernandez1 Per-skill eval with visible results is exactly what teams need before switching models. Nice.
Skillkit
Tessl
@rohit_ghumare Great to hear! What’s one thing you wish the eval workflow did better - debugging failures, comparing versions, etc? We’re iterating fast based on comments like yours. :)
Skillkit
Tessl
@rohit_ghumare Hey Rohit! I ran all your skills through the Tessl review machinery and sent you a pull request.
Tessl
Excellent point, @rohit_ghumare - you can find the recommendation directly into each skill.
And as for improvements, head over to "Optimize this skill"! @_popey_ leveraged this to improve your skill already, and you can already merge the changes in your repo!
I see you have the `insights` skill with ~80% performance, give it a go, and let me know if you encounter any issues
How are you validating real user behavior at Tessl right now?
Tessl
@danilpond Two evaluation methods today.
First, skill reviews - when you submit a skill, it gets scored against structure and best practice criteria established by Anthropic, combining validation checks with LLM-judged quality. This tells you immediately whether your skill is well-constructed.
Second, task-based evaluations - scenario-based evals where you run end-to-end tasks and track results against real agent behavior. Teams submit a skill, see their scores, iterate, and resubmit - and we can measure the delta between versions. That second approach is where we validate evaluation scenarios.
We're also working on new approaches beyond these two, more to share in the coming weeks. Keen to hear if this is what you had in mind, and whether you've spotted an opportunity for improvement?
Really strong launch. The "package manager for agent skills" framing is exactly where teams are heading as multi-agent workflows get real.
What stood out to me is the eval + optimization loop: most teams can feel output drift but can’t isolate whether the issue is model choice, prompt context, or skill quality. If Tessl can make that diagnosis explicit (before/after score deltas per skill revision), that’s high leverage for shipping faster with fewer hallucination regressions.
Curious if you’re planning CI hooks so teams can gate skill changes on eval thresholds the same way we gate tests/lint in code pipelines.
Tessl
@danielsinewe Spot on about the diagnostic gap - isolating whether drift is coming from the model, prompt context, or skill quality is exactly what the eval loop surfaces. Before/after score deltas per skill revision are live today - perhaps we need to surface it better?
The CI hooks idea is really interesting, and we've been thinking a lot about it. I want to make sure I'm tracking what you're imagining though - are you thinking gating at the PR level, deployment level, or something else? Keen to get your thoughts on this!