Foresight by Lightning Rod - Predict anything with AI

by
Foresight by Lightning Rod is an OpenAI-compatible forecasting API for developers building agents, prediction-market bots, and decision tools. Ask a question about a future event and get a scored, calibrated forecast back. Unlike general-purpose LLMs, Foresight is trained and evaluated on real-world outcomes, with benchmark-verified accuracy, cheaper inference, and a drop-in API for forecasting workflows.

Add a comment

Replies

Best

Hey Product Hunt — Ben here, founder of .

Frontier AI is powerful, but it is not built for forecasting. Frontier models are trained to produce plausible text, not well-calibrated probabilities about what will actually happen. They are also expensive to run inside agentic workflows, where bots may need to forecast thousands of markets, events, or decisions.

We trained Foresight to make better predictions at lower inference cost.

Foresight is an AI forecasting API with better accuracy at a lower inference cost. It is trained using our Future-as-Label method ( at the ICML 2026 AI Forecasting Workshop), which uses real-world outcomes over time for training. Instead of hand-labeling datasets or imitating generic text, Foresight learns from what actually happened.

Foresight beats frontier models 100x larger on live prediction benchmarks, like ProphetArena and ForecastBench, with a particularly large lead in prediction market categories like Sports & Politics.

Our API is OpenAI-compatible, so developers can easily swap it into existing workflows.

Better accuracy. Cheaper inference. OpenAI-compatible API.

Use code PHFORESIGHT for $50 free API credits this month.

We'd love feedback from builders working on forecasting agents, prediction tools, or any workflow where better forecasts matter.

 Congrats on the launch, Ben. A forecasting API actually benchmarked on real-world outcomes is rare — most AI tools claim predictive power with no calibration to back it. Purpose-building it for agents and prediction markets, accurate and cost-effective, is a sharp wedge.

One idea while it's live: your PH launch is still editable, and a video in the gallery holds attention better than screenshots. So I made you one from your site, free, and it's whitelabel — yours to post as your own:

I build FoxPlug — it turns your real product updates into videos, posts, and GIFs automatically: guided feature tours, GIFs, short posts for X, long-form for a blog, and full product walkthroughs. — building in public, loudly.

agreed! We think forecasting is a powerful benchmark for AI generally–it requires a deep understanding of how the real world works, and it's impossible to hack since the outcomes haven't happened yet.

Cool video!

 Congrats on the launch, Ben. Really interesting angle. I like that you’re treating forecasting as its own capability, not just another LLM feature.

Curious where you’re seeing the strongest early use case so far: prediction-market bots, agents making decisions under uncertainty, or internal business forecasting?

 thanks Marie! The strongest pull so far is developers building forecasting bots and agents of all kinds. A decent chunk is prediction markets, but a lot of it is broader: any agentic workflow that needs to make calibrated estimates about the future, where we provide a more accurate and economical solution than calling frontier models.

Would love any feedback if you give it a try!

 Congrats. For teams already using agentic workflows or prediction markets, what’s the easiest way to validate Foresight’s calibration on our live signals before fully switching over? Would you recommend a lightweight A/B test, an offline benchmark, or a hybrid approach; and do you have any sample metrics or thresholds you use to decide when to migrate?"

 absolutely – very easy to do a lightweight A/B test by swapping in our endpoint in any OpenAI-compatible workflows, and we always do benchmarks for our clients on their datasets before shipping anything. Default metrics tend to be Brier Score & ECE (expected calibration error) - you can see some example of how we measure this in our case studies:


 One good way to evaluate this if you already have a live forecasting process set up would be to test foresight-v4 on the same historical questions with the context that was gathered at that time, or even the output of a different model, and measure brier score and calibration of foresight vs the original predictions.

Happy to chat through your use case or provide more examples if that would be helpful -

can developers fine-tune forecasts for specific domains like fiance or healthcare?

 great question. We do offer fine-tuning via our SDK, although its in private beta at the moment – please reach out if you're curious in giving it a shot.
One big advantage of our Future-as-Label methodology () is that we can fine-tune models using the messy, unstructured operational data that companies already have (like docs, reports, patient records, claims, etc) instead of needing labeled datasets to train on. So we regularly train custom models () for clients and we're working on making this more out-of-the-box.

 We work 1-1 with enterprise clients for training these types of models. Most organizations sit on mountains of unstructured data full of predictive signal that can we used to forecast verifiable outcomes. Think patient notes, investment decks, CRM content... anything with a timestamp. Shoot me a message if you have a specific usecase in mind

One thing I'm curious about: if a lot of forecasting agents end up using the same underlying model, their predictions naturally become more correlated. That's fine for a single application, but it changes things in systems that rely on independent signals. Has that come up with customers using Foresight at scale? Congrats on the launch!

 That is true, but its also worth pointing out that a model choice is only half of the story - the other half is the prediction context you provide to the model as a prompt. That context becomes your "secret sauce" that can give you an edge over other forecasters, even when using the same underlying model.

But if you don't want to build your own custom context aggregation pipeline, we recommed trying out our "research mode" that does it for you:

 great question — yes, there’s definitely some correlation if everyone is using the same underlying model.

That being said, many of the best builders don’t use Foresight as one static oracle. They ensemble prompts, vary the context, reason from different assumptions, compare against market prices / their own signals, and aggregate.

That’s one reason cheaper inference matters: you can actually run complex / multi-step forecasting workflows at scale.

 Yes, that is definitely a factor to consider when applying this. Foresight users are often using it alongside other signals. In addition to the model, having diverse sources of context to reason over is really critical. But ensembling a variety of models and context sources can lead to better results. We think foresight is a strong independent signal! Do you have a particular system or application in mind?

Love the focus on forecasting agents. This feels like a missing piece for agentic workflows. Congratulations!

 thanks, we agree!

Really interesting product. Do you see Foresight being used for cybersecurity risk prioritization... for example forecasting whether a vulnerability or exposed service is likely to be exploited within 30/60/90 days based on threat intel, EPSS/KEV, asset criticality, and exposure context? Curious what inputs improve calibration most, and how you handle high-consequence cases where a ‘low probability’ event still needs action.

 Super interesting use case! This would be a great example of a custom model – if we had historical documentation / records of these breaches I suspect we'd learn alot with some custom training here.
We do see strong results for high-consequence, low-likelihood use cases– a great recent example is our case study on predicting supply chain disruptions:

 That is a very interesting potential application, and it sounds like the sort of thing that could benefit from training a custom model. If you have that sort of data you can read about how we train custom models here, and schedule a call to talk it through:

The calibration angle is the part that actually matters, and the part most forecasting tools skip. When we plugged raw LLM probabilities into a decision loop, the point estimates were fine but the confidence was wildly off at the tails, so the expected-value math downstream was garbage. Two things I'd want to know: do you return a calibration band or just a point probability, and how does calibration hold up under regime shift, when the future stops resembling the outcomes you were scored on?

 Calibration is actually where we see some of the biggest gains from training. We've demonstrated this across a few domains (papers here: ), and we often find that out-of-the-box LLMs are often wildly miscalibrated.
For high-stakes use cases we can also ensemble to get confidence bands around the probabilities themselves.

On regime shift: these are reasoning models, so what we're really training is better thinking, not a fixed mapping from inputs to probabilities. That typically holds up surprisingly well under regime change. I'd love to hear any feedback if you try it out!

Have you compared it against prediction markets directly, or only against LLMs?

 Yes — we regularly compete on 3rd party benchmarks like ForecastBench and ProphetArena, where we beat frontier models and are evaluated against live market outcomes. We do see an edge in some categories like Sports and Politics. That said, a base forecasting model is just one layer — signals gathering, market timing, bet selection, position sizing all matter too. Foresight gives you a strong calibrated foundation, but sustained alpha in prediction markets requires a differentiated approach on top, like proprietary data sources or novel strategies that can't be easily replicated.

Interesting idea. is there a public demo where developers can try a few forecasts before integrating the API?

 Yes! Log in at and you can run forecasts directly in the UI.
We give new users free credits to play around with, and PHFORESIGHT gets you $50 more this month.

 As Ben mentioned, you can access our dashboard from

Here is a quick demo of our chat playground interface, where you can try the model and see how to integrate it with the API:

The OpenAI-compatible interface is the right call. It means teams can drop this into existing agent pipelines without touching their orchestration layer. We've hit the same problem with general LLMs hallucinating probabilities. They'll say '70% chance' with no calibration behind it. How do you handle calibration drift as events resolve? Is accuracy validated continuously against a live benchmark, or is it a periodic evaluation cycle?

 We do run continuous benchmarks (including using 3rd parties like ForecastBench & ProphetArena) and haven't seen meaningful drift over time. One big reason is that the model anchors on the context you give it rather than leaning only on its weights, so it learns how to reason over whatever info is gathered, not just what it knows.
Also, the gains we get in training tend to show up in reasoning process itself, which makes it far less sensitive to regime shifts than a model that's just memorized patterns.
Would love any feedback if you try it!

Love that this is trained on real-world outcomes rather than just text patterns, making it a purpose-built forecasting layer that general LLMs simply cannot replicate.

 100%! We find that general LLMs are pretty bad at forecasting out of the box. Training specifically for forecasting massively improves results and lets us use a smaller model to offer much cheaper / more efficient inference.

12
Next
Last