Untitled

robots.txt is 30 years old and we never built the next layer

by

Every major crawler respects robots.txt. It was written in 1994, has no formal standard, and tells bots one thing: where not to go.

That's it. That's the entire vocabulary the web has for communicating with automated agents.

In 2026, those agents are archiving pages, training AI models, extracting authorship, and building knowledge graphs — and the only signal a page can send is "don't crawl me at all" or silence.

There's no way to say:

  • "This is a draft — don't cite it as my position"

  • "This is ephemeral — don't archive it"

  • "This was written by multiple people — don't assign a single author"

  • "This is AI-generated — flag it in your lineage records"

I spent the last few months writing that vocabulary. It's called the Untitled Protocol — one meta tag, six states, MIT licensed.

Not looking to replace robots.txt. Looking to add the layer that should have come after it.

What would you add to the vocabulary?

3 views

Add a comment

Replies

Be the first to comment