Outcome Metrics 1.0
6 new metrics defined. 1 reframed. A framework for the era of work-done software.
The Case for New Metrics
For most of software history, the product and the work were separate things. You bought software to help you do work. Whether it helped — and how much — depended on how well you implemented it and how thoroughly your team adopted it. Software gave you capability. What you did with that capability was up to you.
A new class of software companies has inverted this. They do not sell access to software. They sell work completed by it. Per resolved ticket, per originated mortgage, per qualified lead, per executed contract review. The software is infrastructure. The work delivered is the product. And every major metric we built to run software businesses — ARR, CAC payback, NRR, health scores — was designed for the old model. Applied without modification, each tells an incomplete story.
Three things are structurally different about building and running this kind of business. These are not predictions about where the market is going. They are descriptions of companies already operating this way — and finding that none of their reporting infrastructure was built for it.
1. The focus shifts from adoption to deployment. Outcome-based software is self-adopting. The system either completes the work or it doesn’t. The question is no longer whether users will engage with the product. It is whether the AI can be deployed reliably enough to deliver work in the customer’s specific environment — and how fast.
2. Value is quantifiable, not interpretive. A resolved ticket is a fact. An originated loan is a fact. The value a software vendor delivers is measurable in the same units as the value a customer receives. Time-to-value stops being a narrative construct and becomes a hard operational number that can be tracked, compared, and improved.
3. Expansion is structurally uncapped. Seat-based expansion was bounded by headcount. Outcome-based expansion is bounded by the volume of work a customer has to complete — and that ceiling doesn’t exist. As the cost of completing each unit falls, customers process more work and deploy AI into workflows that were previously uneconomic to touch. The addressable surface grows with the customer’s business, independent of how many people they hire or how many seats they buy.
The category is moving from art of the possible to operational reality. The metrics need to move with it. That’s what this guide is for.
Unpacking the Assumptions
These three assumptions are not abstract. Each one has a structural mechanism behind it — and each one changes how the business needs to be measured. Before introducing the new framework, it is worth understanding exactly what changed and why.
Assumption 1: From Adoption to Deployment
In a seat-based world, every seat has to be justified at renewal. The only way to justify it is if someone is using it. That made adoption the gating function for value — and it created an entire post-sale motion designed to drive human behavior. Health scores tracked logins. QBRs told the story of engagement. The CSM existed to close the gap between what the software could do and what the customer had done with it. Whether users were actually getting value was almost always a matter of interpretation — and interpretation was the CSM’s job.
Outcome-based software is self-adopting. The system either completes the work or it doesn’t. There is no user to train, no behavior to change, no adoption curve to manage. But deployment is heavier — for two reasons.
First, AI-native software is non-deterministic. A mortgage rate calculator returns the same answer every time. An AI underwriting model fed the same application twice may reach different conclusions — because it operates in a domain where correct is probabilistic rather than rule-governed. You cannot test it once and trust it indefinitely. Non-deterministic systems have to be made production-safe in each customer’s specific environment.
Second, the degrees of freedom are wider. In SaaS, one company’s instance looked a lot like another’s. In an outcome-based deployment, every customer’s data, goals, guardrails, and edge cases are different. Someone has to connect the model to the customer’s environment, configure it, evaluate its outputs against real production inputs, and resolve the problems that only emerge once the system is running.
This is the origin of the Forward Deployed Engineer — and the reason the FDE job posting chart went near-vertical between 2023 and 2025. The role is not a rebranded CSM. It is a role built for software that does enough of the work that the old CS motion stops making sense.
[Figure 1 — Forward Deployed Engineer job postings per month, 2020–2025.]
Two archetypes are forming. The product-building FDE, rooted in Palantir's model, treats customer proximity as a forcing function for platform evolution — field solutions get abstracted back into the core product. The outcome-driving FDE embeds in the customer's operating environment and measures success by whether the AI is producing reliable work in production. This is the more common archetype in applied AI today. Most companies draw from both. The metrics in this paper are built around the outcome-driving motion. The product-building FDE operates under a different cost structure — one worth measuring separately, but outside the scope of this framework.
This emphasis on deployment is not a phase that immature products grow out of. Highly tailored deployments are a structural reality of outcome-based businesses — and they are here to stay.
Assumption 2: Value Is Quantifiable
Every software business model can be understood through a single lens: the distance between when a customer pays and when they formally commit to paying again. Everything between sale and renewal — QBRs, health scores, executive business reviews — is activity designed to close that gap.
The history of software business models is the history of that gap narrowing.
On-premise software stretched it to years. Enterprise implementations averaged 12 to 18 months before the system was live, and whether the investment had paid off was often unclear until the next major buying decision. Seat-based SaaS compressed the gap to more frequent, often annualized cycles — and shifted risk toward the vendor. Switching costs dropped. Software that wasn’t delivering value could be replaced, not shelved. That exposure is what created Customer Success. Consumption-based software compressed it further. Each unit of usage was its own small act of recommitment.
Outcome-based software eliminates the gap entirely. Value and the software effectively become one and the same. A resolved ticket is not evidence of value — it is the value. An originated mortgage is not a proxy for ROI — it is the ROI. The work delivered and the value confirmed are the same event.
[Figure 2 — The Value Line. Distance between initial sale and value confirmation, by business model.]
The entire SaaS post-sale infrastructure — health scores, QBRs, adoption programs, success plans — was built to manage an information asymmetry. The vendor knew more about whether the product was working than the customer could easily verify. Outcome-based software removes that asymmetry. The customer can see the output. They do not need a CSM to tell them whether the ticket was resolved.
And that requires rethinking the metrics that flow from it.
Assumption 3: Expansion Is Structurally Uncapped
In SaaS, every customer had a terminal value. It was bounded in two directions: the number of people using the software, and the finite catalog of products and modules the vendor built to sell. Expansion meant more seats or more products. Both had a ceiling.
In a work-done model, the upper bound on value is the volume of work the customer has to complete. That is a fundamentally different constraint — because it grows with the customer’s business, independent of how many people they hire or how many seats they buy.
Uber is the clearest illustration of this. Uber did not replace taxis. It captured a latent demand for transportation that taxis could never have served at scale. Before Uber, if there were no taxis available, people walked, or waited, or didn’t go. Uber made those trips economical — and in doing so, grew the total market for transportation, not just its share of it.
Outcome-based businesses do the same thing to work. Before AI, if a workflow was too expensive or too labor-intensive to run at scale, it simply didn’t happen. A customer who resolves 5,000 support tickets per month with AI may never have staffed the team to handle 5,000 human resolutions — not because the demand wasn’t there, but because the economics didn’t justify it. The demand was latent. The AI makes it economical.
This is why the expansion surface in outcome-based businesses is structurally different from SaaS. The denominator is not the customer’s headcount or their willingness to pay for additional modules — it is the total volume of work they have to complete. And as the cost of completing each unit falls, that volume expands: customers process more work and deploy AI into workflows that were previously uneconomic to touch.
If SaaS is an annuity, outcome-based businesses are toll booths collecting revenue on every outcome that passes through. Every unit of work completed is its own small act of recommitment. The revenue grows when the customer’s business grows, when they trust the system with more processes, and when they run work through it that they never ran before.
These are not the same business.
The Value Creation Journey
The three assumptions above are not independent — they compound across a single customer relationship in a predictable sequence. Understanding that sequence is the prerequisite for understanding why the metrics are defined the way they are.
Every outcome-based customer relationship moves through four distinct stages. They are not arbitrary divisions. They map directly to how capital flows, how risk resolves, and how the economics of the relationship change over time.
[Figure 3 — The Four Stages of an Outcome-Based Customer Relationship. The curve drops linearly from contract signature to First Outcome as deployment costs accumulate, then climbs through Outcome Velocity as outcome revenue builds. It crosses zero at Deployment Payback and compounds into Margin Inflection territory.]
Stage 1 — First Outcome
First Outcome is the moment the AI produces its first measurable unit of work in production. A ticket resolved. A lead qualified. A document reviewed. This is proof of concept — the system works, it is producing something, and the customer can see it. It is also the psychological inflection point in the customer relationship. Every day between contract signature and first delivered work is foregone revenue against a fixed deployment investment, and customers who see value early expand early.
A note on the dominant selling motion in applied AI. The proof-of-concept-heavy approach that characterizes most early-stage AI deals is not a go-to-market decision or a sign of weak positioning. It is the structural reality of selling a non-deterministic system. A customer cannot be asked to commit at scale to a system whose outputs are probabilistic until they have seen it perform reliably in their own environment. The proof of concept is not a sales technique — it is the mechanism by which a vendor demonstrates that First Outcome is achievable in this customer’s specific context. Founders who feel defensive about POC-heavy pipeline should reframe: they are not stuck in proof-of-concept mode because their sales motion is immature. They are there because the product requires it.
Stage 2 — Outcome Velocity
Outcome Velocity is the repeatable phase. The system is producing its first outcome consistently — the customer is pushing more work through the same workflow, trust in the agentic process is building, and edge cases are being resolved in production. The goal of this stage is momentum. If the first outcome was proof, outcome velocity is signal.
This is the account-level equivalent of product-market fit: are customers choosing to run more work through the system, or are they waiting to see more before they commit? A deployment that stalls here — where a customer saw the system work once but never expanded volume — almost always traces back to the economics not being compelling enough to make expansion an obvious decision.
Stage 3 — Deployment Payback
Deployment Payback is the moment cumulative outcome revenue crosses the combined cost of acquiring the customer and deploying the system for them. Until this point, the vendor has spent more than they have recovered. Everything between contract signature and Deployment Payback is investment — in sales, in integration, in configuration, in getting a non-deterministic system to reliable production output in a new customer environment. Everything after it is return.
This is the milestone that will matter most as the category matures. In a capital-rich environment — as exists for many well-backed applied AI companies — it is easy to overlook. Deployment costs get absorbed, payback periods go unmeasured, and growth masks the underlying unit economics. But as these businesses scale, Deployment Payback will become the first evidence that a go-to-market motion is financially justified. The speed of the path from First Outcome to Deployment Payback is a direct signal of business model health — a steep, fast trajectory means outcome volume is ramping quickly relative to the cost it took to stand the system up. The founders who measure it now will be the ones who aren’t surprised later.
Stage 4 — Margin Inflection
Margin Inflection is everything beyond Deployment Payback. At this point, the integration is done, the system is calibrated, and the customer’s value from the AI is proven. Each additional unit of work completed from here flows directly to margin. There is no new customer acquisition cost. There is no new deployment cost. The compounding engine is running. This is the phase that makes outcome-based businesses structurally different from SaaS: there is no seat ceiling, no license cap, no negotiated renewal to survive. The only constraint on revenue growth is the volume of work the customer has to complete.
The New Cost Structure
These four stages play out against a cost structure that looks fundamentally different from SaaS.
Deployment cost is no longer separable from the path to revenue. In product-led and mid-market SaaS, deployment was largely the customer’s responsibility — light onboarding, self-serve setup, a few hours of training. In enterprise SaaS, deployment was substantial but billed separately, classified as non-recurring revenue, and structurally kept off the ARR line. Investors applied lower multiples to services revenue than to subscriptions, so vendors had every incentive to minimize it as a reported category. Either way, deployment cost was separated from the metric that mattered. It was friction to minimize, not investment to optimize.
In outcome-based models, that separation disappears entirely. Deployment is not peripheral — it is the precondition for revenue, because vendors only start earning when the system is producing work.
The table below captures how this and other structural differences flow through every major operating dimension:
The Metrics
The seven metrics below apply to businesses that price on the work their AI completes. That work spans a wide range — from a discrete action taken on a customer’s behalf, like a support deflection, to a complex outcome produced by a sequence of actions, like an originated loan. Some businesses price on the action. Others price on the outcome. Many will price on both as their product matures. The metrics apply across that spectrum.
Metric 1: Cost of Deployment (COD)
Cost of Deployment is the total spend from contract signature to the first measurable unit of AI work delivered. It is the price of turning a probabilistic system into a reliable one.
COD = FDE_days × FDE_daily_rate
FDE time includes:
Integration engineering
Configuration and workflow setup
Model tuning and evaluation
COD matters because it must be recovered before a customer becomes profitable. In SaaS, the cost of acquiring a customer was the primary threshold to clear. In outcome-based businesses, there are two: the cost of winning the customer, and the cost of deploying the system for them. Both must be recovered from outcome revenue. Until they are, the vendor is underwater.
We recommend tracking COD at the customer level, not as a blended average. A digitally native customer with clean APIs might carry a COD in the low five figures. A regulated enterprise navigating legacy systems and multi-step compliance workflows might carry a COD of $100,000 or more. Blending those two obscures the economics of both.
Metric 2: Time to First Outcome (TTFO)
Time to First Outcome measures the number of days from contract signature to the delivery of the first measurable AI work unit in production. Every day of delay is foregone revenue against a fixed deployment cost.
TTFO = days from contract signature to first AI work unit delivered
TTFO has one meaningful advantage over time to value as SaaS understood it: it is binary. In SaaS, value was subjective — when had the customer truly gotten value from the software? The answer depended on who you asked, what they prioritized, and how the story was framed at renewal time. TTFO has none of that ambiguity. The ticket was resolved or it wasn’t. The loan was originated or it wasn’t. This binary quality elevates time-to-value from a soft narrative construct into a hard operational number that can be tracked, compared across deployments, and improved systematically.
We recommend treating TTFO as the single most important operational metric in the first 90 days of any customer relationship. Customers who reach First Outcome quickly gain confidence fast and expand early. Customers who wait months tend to second-guess the investment before they have seen it pay off. In Genera’s work supporting AI-native deployments, the best-performing companies are consistently reaching First Outcome in under 14 days.
Metric 3: Outcome Cost Efficiency (OCE)
Outcome Cost Efficiency is a new metric — one that has no direct analog in the SaaS era. It measures how many dollars of human labor value a customer receives for every dollar they spend on AI-completed work, and it is a primary predictor of both retention and expansion.
OCE = human_labor_cost_per_unit ÷ outcome_price_per_unit
A customer paying $8 per AI-resolved ticket, replacing work that costs $25 per human-resolved ticket, has an OCE of 3.1x. They receive $3.10 of labor value for every $1.00 they pay. When this ratio is high, customers have a structural economic incentive to expand. When it is low, they do not — regardless of how satisfied they are with the product.
We recommend using fully-loaded labor cost, not base salary. A $55K support agent costs closer to $90K when you account for benefits, management overhead, real estate, training, and turnover. Using base salary understates OCE and misreads the expansion signal.
OCE may also prove to be the leading indicator of churn — ahead of satisfaction scores, ahead of NPS. A customer whose OCE has compressed below 1.5x may simply not be saving enough money to justify the relationship, regardless of how they feel about the product. A customer with an OCE above 3x who encounters friction is likely to stay anyway — because leaving means replacing cheap AI-completed work with expensive human labor. If that hypothesis holds, it inverts the entire logic of how post-sale teams prioritize accounts.
Two open questions this framework does not yet resolve. First, how OCE behaves in labor-constrained markets, where the relevant comparison is not cost but availability — the alternative to the AI is not a cheaper human, but no one at all. Second, whether competitive pricing pressure will compress OCE over time as vendors compete for the same workflows.
Metric 4: Gross Margin Per Outcome (GMO)
Gross Margin Per Outcome measures the profit on each individual unit of work completed. It is the unit-level margin metric for outcome-based businesses.
GMO has one structural property worth understanding: inference and token costs — a significant component of the cost of goods — have been falling rapidly, declining roughly 10x per year over the past two years. Falling compute costs are not new. Moore’s law has been compressing hardware costs for decades, and SaaS companies benefited from cheaper hosting and storage throughout their history. What is different is where compute sits in the cost structure. In SaaS, compute was a small fraction of COGS — people costs dominated. Falling hardware costs were a background benefit that improved company margins gradually over years. In outcome-based businesses, inference is a direct input to every unit of work delivered. When a significant cost component is falling at this rate, the margin impact is immediate and visible at the unit level — not buried in a blended P&L.
Whether this deflation continues at its current pace, slows, or reverses for frontier-class models is an open question. What is observable today is that the direction of the cost curve favors GMO expansion for businesses running on commodity inference. Businesses whose outcomes require the most capable models may face a different trajectory entirely.
GMO = outcome_price − cost_of_goods_per_outcome
Cost of goods:
Token and inference cost
Human oversight and escalation cost
Infrastructure cost
At current pricing, an AI support agent at $8 per resolved ticket with $1.40 in total cost of goods produces a gross margin of 82.5%.
We recommend measuring GMO at the unit level, not as a blended company margin. Blended margins incorporate human oversight ratios, escalation costs, and managed service layers at varying stages of deployment maturity — they are a lagging indicator of fleet-wide deployment health. GMO measures the margin on a single completed work unit, which is what governs deployment economics.
Metric 5: Two-Sided Payback Period (2PP)
In SaaS, the payback period measured one thing: how long it took the vendor to recover the cost of acquiring a customer. In outcome-based businesses, the customer’s payback period is equally critical — because it is the primary driver of expansion velocity, and expansion is the growth engine.
Vendor Payback Months = (acquisition_cost + deployment_cost) ÷ monthly_outcome_gross_profit
Customer Payback Months = customer_deployment_investment ÷ net_monthly_labor_savings
When the vendor absorbs the full deployment cost, the customer’s payback approaches zero — positive ROI from the first unit of work delivered. This is a powerful expansion accelerant but concentrates all of the deployment risk on the vendor side.
We recommend tracking both sides explicitly and treating the slower of the two as the binding constraint on expansion velocity. A vendor recovering costs quickly while the customer sees weak ROI produces a stalled account. A customer seeing enormous value while the vendor bleeds through a long payback produces growth that can’t sustain itself. The flywheel only fires when both sides are short.
[Figure 5 — The Two-Sided Payback Matrix. The growth flywheel only fires when both vendor and customer payback periods are short.]
Metric 6: Process Penetration Rate (PPR)
Process Penetration Rate measures the percentage of a customer’s addressable AI workflows that are currently deployed. It is the metric that tracks progress from first outcome toward the theoretical ceiling of what an account can become.
PPR = active_deployed_processes ÷ (active_deployed + addressable_undeployed_processes)
Every account has a terminal value — the point at which all addressable work is running through the system and process expansion is exhausted. Where that ceiling sits is ambiguous. The world does not know yet, in the same way that even the best investors underestimated the size of the market Uber would create. But getting a handle on what that terminal point could be — and ensuring the organization is working toward it — is precisely where the post-sale motion earns its keep.
We recommend defining the denominator against the customer’s current human-staffed workflows — the processes they are already running with people. This is a conservative baseline and an honest one. The true addressable surface is likely larger: as the cost of completing each unit of work falls, customers deploy AI into workflows that were previously uneconomic to touch. Current labor volume is a floor on demand, not a ceiling. PPR against current labor gives operators a measurable, defensible denominator today. PPR against total addressable workflow — including latent, unlocked demand — is the number that will matter as the category matures.
Metric 7: Net Revenue Retention — Reframed
Every operator and investor reading this paper already tracks Net Revenue Retention. In outcome-based businesses, the formula doesn’t change. But three things about how NRR behaves will mislead you if you read it the way you’ve always read it.
NRR = (beginning_outcome_revenue + expansion − contraction − churn) ÷ beginning_outcome_revenue
First, the shape is wrong. SaaS NRR compounds relatively linearly from month one. Outcome-based NRR J-curves. It looks weak in the first 60–90 days because of the deployment ramp — the system is being made production-safe, edge cases are being resolved, and outcome volume hasn’t reached steady state. Then it accelerates sharply as process economics compound. A founder who sees 80% NRR in month three and panics is misreading a healthy deployment. An investor who sees 200% NRR in month twelve and assumes it will sustain at that rate may be looking at a one-time process expansion that won’t repeat. The early dip is not a warning sign. It is the expected shape. The question is not whether the curve dips — it will — but how steep the acceleration is once it turns.
Second, the ceiling is wrong. The best SaaS businesses achieved NRR of 130–140%. Outcome-based businesses are already reporting NRR above 200% — a figure that would be structurally impossible in most seat-based SaaS, where expansion was bounded by headcount and a finite product catalog. Every NRR benchmark the industry has built was calibrated to a model with a ceiling. That ceiling no longer exists. Comparing your NRR to SaaS benchmarks tells you nothing.
Third, the exposure is different. Outcome volume is coupled to the health of the customer’s underlying business in a way seat-based revenue never was. A customer processing fewer transactions in a downturn will produce lower outcome revenue — for reasons entirely outside the vendor’s control. SaaS NRR was insulated from this. Outcome-based NRR is not. The same mechanism that creates that downside sensitivity also produces the uncapped upside. The risk and the opportunity are the same feature.
We recommend decomposing NRR into its two expansion components rather than reporting a single blended number. Volume growth within deployed processes and new process deployments have very different economics. A rising NRR driven entirely by volume growth within existing processes has fundamentally different operational implications than one driven by new process expansion — and blending them obscures whether the growth is automatic or requires active forward deployed investment.
How to Use These Metrics
The outcome-based dashboard has seven core metrics. Each is a diagnostic for a specific failure mode. Together they give a complete picture of deployment health, unit economics, and expansion trajectory.
For operators standing up this infrastructure for the first time: start with COD and TTFO. They govern the deployment phase, they're the easiest to instrument — COD is an accounting exercise, TTFO is a date — and nothing else you measure will mean anything until you know both. Add OCE once you have outcomes in production. Layer in GMO, 2PP, and PPR as volume matures. Track NRR from day one, but read it differently.
On the Limits of This Framework
Every framework is wrong in some ways. This one treats all outcomes as equivalent — a resolved support ticket and an originated mortgage carry very different economics. The thinking is anchored primarily in support, customer experience, and early sales automation — other verticals will develop their own patterns. And it does not yet capture the value of the proprietary data and workflow-specific fine-tuning that outcome-based businesses accumulate over time.
This is Version 1.0. The goal is to give operators, founders, and investors a shared language. The rest will be built in practice, by the people building at the frontier.
Acknowledgements
David Skok’s SaaS Metrics 2.0 at Matrix Partners is the direct inspiration and intellectual foundation for this work. The conversation between Bret Taylor (Sierra) and John Collison (Stripe) on the Cheeky Pint podcast provided critical grounding for the outcome-based pricing framework. The Palantir FDSE role description (”A Day in the Life of a Palantir Forward Deployed Software Engineer,” 2020) provided the clearest articulation of the product-building FDE archetype. The forward deployed job posting data is sourced from Bloomberry, “What I learned analyzing 1K forward deployed engineer jobs,” 2025.
The thinking here has been shaped by ongoing conversations with founders navigating these dynamics in production — including the teams at Loamy, 1mind, and especially Genera. Michael Edelstein, Cassie Young, Michael Boyd and Junan Pang provided early feedback that strengthened the framework materially. Many Customer Success leaders across the SuccessVP LP base were generous with their time, their pushback, and their operational insights — thank you! All errors are the authors’ own.










This is compelling. Thank you. I would consider adding V2C (Value to Customer). This tracks how much value you are delivering to your customers over time. Needs a formal value model to calculate, but in fact all metrics require underlying models, that is the nature of measurement.
I’m hearing more about customer success being challenged with driving customer adoption of AI tokens that were purchased during the sales process. Low adoption could lead to non-renewal, so adoption of AI tokens becomes a marker or predictor for retention just as usage was for the last x years in seat – based models.