Kimmo Ihanus

AI Search Index - Track which AI bots crawl your website

AI Search Index is the easiest way to see exactly which AI bots are reading your content, which pages they visit most, and how that traffic is growing over time. What you get: - Track 50+ AI crawlers by name (GPTBot, PerplexityBot, ClaudeBot, and more) - See which pages AI agents find most valuable - Understand the split between AI Search vs AI Training traffic - Chat with your data using our AI assistant All in one line of code.

Add a comment

Replies

Best
Kimmo Ihanus
Hey PH! 👋 Kimmo here. Quick backstory: I've been working in AI search for a while, and one thing kept bugging me: it was way too hard to know if AI bots were actually visiting my sites. Most web analytics? Useless for this. Server logs? A nightmare to parse. I was deep in the AI world but couldn't answer a basic question: "Is ChatGPT reading my content?" So we built AI Search Index. How it works: - Lightweight pixel (1 line of JS, no cookies) - User-agent detection for 50+ AI crawlers - IP range verification against published AI company ranges - Real-time data via ClickHouse Nothing fancy. Just bot detection that actually works. After monitoring our own data it turns out GPTBot is everywhere, but I guess most site owners have no idea. Would love your feedback and what would make this more useful for you? — Kimmo
Austin Heaton

@ihmissuti congrats on the launch! Do you also provide content suggestions based on AI crawlers activity?

Kimmo Ihanus
@austin_heaton thanks! Not at the moment but that is one potential area to expand the product. For example schema optimization and technical improvements. So stay tuned :)
mostafa kh

interesting concept! one question though , most ai crawlers don't execute javascript, they just fetch raw html. how does a js-based pixel detect them? wouldn't server-side log analysis be more reliable for bot detection?

Kimmo Ihanus

@topfuelauto Yes you're absolutely correct that most AI crawlers don't execute JavaScript. This app uses a “hybrid approach” so we also have support for server-side log analysis for more robust method for capturing non-JS bots. So the app has multiple server-side log ingestion endpoints that bypass JS entirely and endpoints receive raw HTTP access logs from CDNs and hosting providers. The bot detection logic then analyzes:

  • User-Agent strings - 78+ LLM training bot patterns, 67+ LLM search bot patterns

  • IP ranges - Known CIDR blocks for OpenAI, Anthropic, Perplexity, Mistral

  • HTTP signatures - Signature-Agent header for signed ChatGPT agents

But we’ve learned that sometimes a simple solution is good enough to start capturing data, and here comes the JS pixel that primarily catches things like:

  • ChatGPT web browsing, actually executes JS via headless browser

  • Other AI agents that browse like humans (Claude's computer use, etc.)

  • Human visitors - for comparison metrics

The "critical inline pixel" is designed for "fast bots" but honestly won't help with 100% of the crawlers.

The JS pixel is essentially a fallback for:

  1. Users who can't/won't set up log forwarding

  2. Catching the subset of bots that do execute JS

And there are of course some website platforms that make server side log access impossible or at least difficult (ping Webflow?)