PDFCanon - Sanitize, normalize, and hash any PDF — deterministically

PDFs are structurally chaotic - same document, different bytes, hidden JS, broken audit trails. PDFCanon normalizes any PDF deterministically: strips active content, collapses structural noise, and returns a hardened document with a stable SHA-256 hash. Same input → same bytes → same hash. Every time. Free tier: 100/month. SDKs for .NET, Node, Python, Java, Go.

Hey PH! I'm the co-founder of PDFCanon - and this one's been a long time coming. The idea came from a problem we kept running into: every SaaS platform eventually has to accept PDF uploads. And almost none of them handle it well. They store the raw file, hope for the best, and find out months later that two "identical" documents have different hashes, or that someone slipped JavaScript into a submission, or that an audit trail is meaningless because the documents weren't normalized before storage. There was no API for this. No infrastructure layer. Just a pile of ad-hoc scripts and prayers. So we built PDFCanon - a deterministic PDF normalization and canonical hashing API. The core guarantee is dead simple: same input, same bytes out, same SHA-256 hash. Every time. The pipeline strips dangerous content, collapses structure noise, and produces a document you can actually rely on. We've just launched with a free tier (100 normalizations/month) so you can kick the tires with zero commitment. Would love to hear from anyone building document intake workflows - happy to answer questions. And if you've hit this problem before and solved it differently, I'm genuinely curious how. Thanks for checking us out! 🙏

PDFCanon - Sanitize, normalize, and hash any PDF — deterministically

Replies