What should I test first for a checkable AI-work stack?
Project Telos is scheduled to launch here on June 26. I am building it around one rule: if AI work matters, the person and the system should be looking at the same checkable state, not trusting the model's self-report.
The public line is five flagships:
- gather: witnessed intake and provenance receipts
- index: rerunnable workspace maps and MATCH / DRIFT / UNVERIFIABLE certificates
- forum: accountable multi-agent ledgers
- crucible: public GitHub pre-1.0 judgment and refinement
- the telos engine: shared human-AI perceive-and-make work
The honest stage: solo, independent, pre-revenue, pre-proof on the largest thesis. I am looking for verification and testing against real workflows, technical pushback, early traction from people willing to inspect the receipts, and possibly modest grassroots research funding to keep hardening the checkable-state pieces.
Main site: https://harperz9.github.io
GitHub: https://github.com/HarperZ9
Repos: https://github.com/HarperZ9/gather - https://github.com/HarperZ9/index - https://github.com/HarperZ9/forum - https://github.com/HarperZ9/crucible - https://github.com/HarperZ9/telos
Replies
Small update: I opened an upstream PR to add Project Telos to ai-boost/awesome-harness-engineering's Demo Harnesses list: https://github.com/ai-boost/awesome-harness-engineering/pull/89
The specific test I want is not "does this feel useful?" It is: can you replay what the agent saw, what changed, and why a check passed, drifted, or stayed unverifiable?
If you build with AI agents, pick any one repo/workflow and tell me what receipt would make the result credible. Current stage is still solo, pre-revenue, author-tested, and not independently audited.
Current-state update after launch: the five flagship repos are public, and today's local dogfood loop is passing against gather-engine 1.5.0, index-graph 2.8.0, forum-engine 1.12.0, crucible-bench 1.1.0, and the telos source demo with a 25-tool five-flagship catalog.
The test I care about most now is simple: take one messy real workflow, run it through source intake -> workspace map -> route ledger -> claim verdict -> shared state, and tell me where the receipt stops being useful. The best feedback is a concrete breakage report, not a general reaction.