SchemaFit - CI linter for LLM structured-output schemas

by
SchemaFit is an MIT-licensed CI linter for LLM structured-output schemas. It checks JSON Schema, tool definitions, and response_format schemas against provider-specific constraints before runtime. It catches unsupported keywords, nesting issues, required/optional mismatches, and portability problems across OpenAI, Anthropic, Gemini, Mistral, and Cohere, so teams can fail PRs instead of production calls.

Add a comment

Replies

Best
Maker
📌
I built SchemaFit after running into a recurring structured-output problem: A schema can be valid JSON Schema, pass local checks, and still fail once it reaches an LLM provider API. The issue is that provider-ready structured output is not the same thing as generic schema validity. OpenAI, Anthropic, Gemini, Mistral, Cohere, and other APIs can have different constraints around supported keywords, nesting, object shape, required fields, and portability. The workflow I wanted was simple: 1. define the schema 2. check it before runtime 3. catch provider-specific failures early 4. fail in CI/preflight instead of during production calls SchemaFit is my attempt to make structured-output schemas behave more like production contracts. It is not meant to replace frameworks like Instructor, BAML, LiteLLM, Vercel AI SDK, Pydantic, or Zod. Those are runtime/client/framework layers. SchemaFit is meant to sit before the model call as a static compatibility gate. I also ran it across 50 public structured-output/tool schemas from cookbooks, agent frameworks, and MCP servers. 44 of 50 would be rejected by at least one major provider’s constraints, and only 3 passed all five provider profiles. That is the reason I think this belongs in CI instead of runtime debugging. Benchmarks: GitHub: I’d be interested to hear how others are handling this today: Are you validating structured-output schemas before runtime, or mostly finding issues when provider APIs reject them?

To see how bad it actually is, I ran SchemaFit over 50 real, public schemas, from OpenAI/Anthropic cookbooks, agent frameworks, and official MCP servers, each provenance-linked to its source. 44 of 50 (88%) would be rejected by at least one major provider. Only 3 of 50 were clean across all five. The single biggest culprit was additionalProperties 134 flags, more than every other keyword combined.

One real example: an MCP get_channel_history tool with just two properties breaks OpenAI strict mode three ways at once (no additionalProperties:false, limit not in required, and limit has a default), yet passes Anthropic, Gemini, and Cohere. That's the portability problem in one schema.

Honest caveat: the rule packs vary in firmness. OpenAI's constraints are documented and firm; Mistral's are a conservative reading of its docs; Gemini's are warnings, not hard errors. The corpus is provider-mixed by design, so 88% is a floor, not a cherry-pick.

Corpus, repro script, and provenance:
Repo:

It's MIT, pure-Python, pip install schemafit, static, offline, no API key, zero runtime deps. I'd genuinely like your eyes on the rule packs, tell me where they're too strict, too loose, or just wrong for a provider you use.