Context and Correctness

The two conditions behind every applied AI breakout — and the bill that comes due for everyone else.

May 28, 2026

Ninety-five percent of enterprise AI pilots generate no return. Fifty-six percent of CEOs say they’ve seen no financial gains from AI at all.

The easy read is that the technology isn’t ready. It is. The bottleneck isn’t capability — it’s structural. And once you see the structure, the failure rate stops being mysterious.

I gave a version of this at SaaStr AI Annual this spring. Over the next four weeks I’m writing up the operating manual behind it — what makes applied AI actually work, who you hire to build it, how you measure it, and what Customer Success becomes. This is part one of four.

Every AI deployment that actually works meets two conditions. Every one that stalls is missing at least one of them.

Condition one: high context. The model has to know enough about the customer’s business to do the work — the data, the workflow, the constraints, the edge cases, the things nobody wrote down.

Condition two: verifiable correctness. Once the model produces something, someone — or something — has to be able to tell whether it’s right.

That’s it. Context and correctness. When both are present, AI produces value. When either is missing, it stalls — no matter how good the model is.

Code had both for free.

Ask why code was the first place AI produced real, undeniable outcomes, and you’ll usually hear “because engineers were early adopters.” That’s not it.

Code was first because it’s the one domain where both conditions came pre-installed.

Context is centralized and versioned — the codebase lives in one place, the syntax is standardized, and the history of every decision is sitting in commits, issues, and docs. And correctness is verifiable by definition. The code compiles or it doesn’t. The tests pass or they don’t. No judgment call required.

Cursor, Cognition, Replit, Lovable — they didn’t win because code is easy. They won because code is the rare domain where context is free and correctness is automatic. If any category was going to go vertical first, it was always going to be this one.

Most domains aren’t so lucky.

Step outside code and the conditions start to fray.

Tier-1 support and lead qualification are close — and not coincidentally, they’re where applied AI took hold next. Support has documentation, help centers, and ticket history (context), measured against resolution and escalation rates (correctness). Lead-qual has firmographic data and intent signals (context), measured against meetings booked and conversion (correctness). Both map cleanly to the two conditions. Both got real, fast.

But most business problems don’t arrive with either condition met. The context is trapped in someone’s head, scattered across a dozen tools, or simply never documented. And correctness isn’t a passing test — it’s a judgment call that only someone who’s done the work can make.

This is the same gap I described in Customer Success is a Wrapper, seen from the other side. AI closes the old wrapper gaps — products get easier to build, customers more self-sufficient. But it opens a new and more acute one: getting a non-deterministic system to actually work in a specific customer’s environment.

Someone has to engineer the conditions.

If a domain doesn’t hand you context and correctness for free — and almost none of them do — then someone has to build them. Capture the context. Define what right looks like. Make a probabilistic system reliable enough to trust in production.

That work doesn’t happen on its own. It isn’t a setting you toggle on. It’s a job. And it has quietly become the vendor’s job — the thing standing between a model that could work and a customer who’s actually getting value.

That’s the 95%. Not models that can’t perform — deployments where nobody engineered the conditions for them to.

Don’t take my word for it. Watch where the labs put their money.

In the first week of May 2026 — within roughly a day of each other — both Anthropic and OpenAI stood up enterprise deployment arms. Not as experiments. As multi-billion-dollar bets.

Anthropic launched a joint venture valued around $1.5 billion with Blackstone, Hellman & Friedman, and Goldman Sachs. OpenAI raised roughly $4 billion for a deployment company of its own, backed by TPG, Brookfield, Advent, and Bain. Both copied the model Palantir invented a decade ago: forward-deployed engineers, embedded inside the customer’s business.

Here’s how Anthropic described the actual work:

“An engagement might begin with the company’s engineering team sitting down with clinicians and IT staff to build tools that fit into the workflows that staff already use.”

That is not selling a model. That is capturing context and engineering correctness — in person, one customer at a time.

Think about that for a second, if the companies building the most capable models on earth just concluded that the models are not the bottleneck — and put billions behind the people who deploy them, the lesson for the rest of us isn’t subtle!

Which raises the only question that matters next: who, exactly, is that someone?

That’s the next post.

— JG

Take Harvey: their legal engineers are former lawyers. It turns out the best person to capture the context and check the correctness is someone who's already done the work.

Discussion about this post

Ready for more?