Santosh Arron

Dropstone 1.5 - 2× Claude Code Pro's usage at $15/mo

Dropstone 1.5 is our monthly re-baseline release. We test the top AI coding models every month and rebuild the runtime on whichever wins. This month: DeepSeek V4 Flash, V4 Pro, and Moonshot Kimi K2.6 all hosted in the US, nothing stored on our side. $15/month gets you about 450 deep coding sessions a week. Roughly twice what Claude Code Pro delivers at $20. Full technical report with the cost math and benchmarks: blankline.org/research/dropstone-1-5

Add a comment

Replies

Best
Santosh Arron

Dropstone 1.5 is the first release from our new monthly cycle. Every month my team at Blankline tests the strongest open AI coding models, rebuilds the runtime around whichever wins, and ships.

This cycle we focused on two things. Cost, and safety.

On cost. Dropstone Pro and Heavy 1.5 run on trillion-parameter class open-weight models, the same scale that closed labs like Anthropic charge a premium for. We spent the month measuring what each coding session actually costs us, then squeezing it. $15 a month gets you about 450 deep coding sessions a week. Claude Code Pro gives you 150 to 225 for $20. On capability, Dropstone Pro 1.5 trades blows with Claude Opus 4.7. We match or beat it on most other coding work at a fraction of the price.

On safety. We built Dropstone safe enough to use on our own internal codebase first. Every file write, every shell command, every network call asks before it runs. Everything runs on US servers. Nothing is stored anywhere. That same safety boundary ships to every tier, Free, Pro, and Heavy. Using DeepSeek or Kimi through Dropstone is meaningfully safer than reaching for them directly.

Full math, benchmarks, and the honest losses are in our report: https://blankline.org/research/d...

Zolani Matebese

@santosharron Congrats on the launch Santosh. the monthly model re-baseline idea is interesting but I'd want to understand how you manage coding behaviours cross model, they can be very different and for refactors etc, might be a risk?

André J

But kimi is like 2% of Claude costs right? For devs that need masive amounts of coding power. monthly subscriptions isnt the solution. All you can eat at 2% is what we need!!! 🙏 😬

Sarrah
Dropstone seems cool and caught my attention with the “2x Claude code pro’s usage at $15/mo”. But after reading the site and FAQs, I am a bit confused. Wanna make sure I'm reading it right before I form an opinion. Tbh I need the ELI5 version lol. Here are my questions: 1. “2x Claude Code Pro usage” is a price comparison, not a Claude integration, right? The actual models (this month?) are DeepSeek and Kimi. What does Dropstone add over just using them through DeepSeek's API? (My guess is US hosting + the IDE, but the site also positions the monthly re-evals as the value prop.) why should folks use Dropstone? I’m having a hard time answering that and would love your input. 2. The two benchmark cards use almost entirely different tests. The two that show up on both (SWE-bench Verified and Pro) are the ones where you trail Claude, and the cheaper Pro tier actually beats the premium Heavy tier on them. Could you share one table with the same benchmarks across all the models? Hope my questions were clear and do forgive me if I misunderstood any parts
Ada Johnsen

The monthly re baseline idea makes sense. Coding models changes so fast that the best option can shift quickly. How do you decide when a new model is stable enough to become the default for Dropstone?

Santosh Arron

@ada_johnsen Our eval team runs a fixed harness monthly across capability, cost-of-service, and safety-of-integration, and whichever model wins the composite score for that tier becomes the default.

Felix Li

The monthly rebaseline is the part I’d want to understand before switching. If Heavy moves off Kimi in 1.6, can I pin a repo to the 1.5 behavior for a while, or does the CLI always follow the current winner?

Santosh Arron

@novamaker01 Yes, you can always switch back and use Heavy 1.5. When Eval Team performs a monthly rebaseline and updates the baseline model it doesn't remove your ability to access older supported versions and you are not forced to follow the current version if you prefer to stick with what you know works for your codebase.

Karim Ben

Can teams bench host if they need stricter compliance?

Mateusz Gierlach

2x Claude Code's usage at $15/mo: Auto-benchmarking open-weight models monthly and rebuilding around the best one is a bold operating model. How do you keep the agent's behavior/context consistent for users when the underlying model swaps each month? Continuity across model changes seems like the tricky part.