We built Agent-Corex after hitting 'context bloat hell' with 200+ tools
Hey everyone! π
We just shipped Agent-Corex, and I want to share the story of why we built it.
The Problem We Faced:
Six months ago, we were building an LLM agent system that had access to ~200 different tools. We did what seemed logical: we dumped all of them into the system prompt.
It was a disaster.
Our API costs exploded (30K tokens per request π±)
Inference was slow (2.3 seconds per response)
The LLM kept getting confused about which tool to use
We were burning through context windows like crazy
We realized we had a problem: how do you intelligently select which tools to include without manually curating for every scenario?
The Solution:
We built a hybrid ranking system that:
Keyword matches your query against tool names/descriptions (<1ms)
Understands semantics using embeddings to find related tools (50-100ms)
Scores everything using a smart blend (30% keyword + 70% semantic)
Result? Only 5-10 tools per query instead of 200.
The impact:
β 68% reduction in API costs
β 4.6x faster inference
β Same capability (the LLM still has access to everything, just smarter selection)
β 95%+ test coverage, production ready
Why Open Source:
We realized this is a problem every team building LLM agents faces. So we open-sourced it (MIT license) with zero dependencies for basic usage.
What We're Looking For:
Early adopters - Try it, break it, tell us what sucks
Use cases - How are you using it? What edge cases are we missing?
Contributions - Better ranking algorithms? Different embedding models? We're all ears
Feedback - Before we build the enterprise version, what features would actually help?
Quick Start:
pip install agent-corex
Then:
from agent_core import rank_tools
# One line to get smart tool selection
relevant_tools = rank_tools(
query="your task here",
tools=all_your_tools,
method="hybrid",
top_k=5
)
We're at v1.0.1 and this is just the beginning. Would love to hear what you think, especially if you're already dealing with tool selection headaches.
Ask us anything:
How does it compare to your current approach?
Are there use cases we're not thinking about?
What would make this 10x better for your workflow?
Looking forward to building this with the community! π

Replies