Launched this week

Laguna by Poolside
Foundation models for agentic coding and long-horizon work
161 followers
Foundation models for agentic coding and long-horizon work
161 followers
Poolside is a foundation model company bringing intelligence to everywhere work gets done. Their mission is to drive abundance for humanity by creating artificial general intelligence.


Tabstack by Mozilla
Open weights at the frontier! @Poolside released this week Laguna M.1, their most capable model to date, with 23B active params and a 256K context window, now Apache 2.0 on both checkpoints.
Run it on your own infra, evaluate it in your own harnesses, fine-tune it, and build on it directly.
H/O to founders @eisokant and @jasoncwarner. OSS ftw!
Poolside has been building quietly for a while and this is the payoff. The 256K context window is the real story for agentic coding - that's where most code agents fall apart, when context fills up halfway through a refactor and the model starts losing track of what it already changed. 23B active params on Apache 2.0 is a strong combo for anyone who can't send proprietary code to closed APIs. Curious how it holds up on actual multi-file editing tasks vs synthetic benchmarks - that's usually where the gap between lab numbers and real workflows shows up. Nice launch.
The 256K context window at 23B active params is a strong architectural bet. Long-horizon agentic tasks without chunking is where most models fall apart. We've been navigating the inference infra tradeoff for our own agent layer, and self-hosted open weights changes the calculus significantly. How does Laguna handle attention at max context? Any sparse attention or positional tricks that keep it tractable?
Open weights with a 256K context window at 23B active params is a big deal for agentic coding, that should really help on long-horizon refactors where context runs out fast. Curious how Laguna M.1 holds up inside Cursor or Claude Code style loops. Congrats on shipping.
StartupBase
Apache 2.0 on both checkpoints is the real unlock here. Most frontier-level models stay closed exactly at the point where they become useful for production workloads.
Curious what fine-tuning looks like for teams that want to specialize it on a specific codebase or domain. Is that straightforward with the current weights?
Long-horizon coding needs evaluation beyond task completion. How do you measure recovery after bad edits, context drift across long runs, unnecessary tool calls, and whether a human can reconstruct why the agent made each decision?
The open weights are what caught my attention. Being able to run and tune a serious coding model on your own infrastructure is a big plus.