What does a custom AI agent actually cost? An honest breakdown
A line-by-line breakdown of the real cost of designing, building, and running a custom AI agent for B2B — engineering, model spend, observability, and the operational tax that's missing from every vendor pitch.
When a B2B leader asks us "what does a custom AI agent cost?", the answer they want is a number. The honest answer is: it depends on six things, and the headline engineering cost is rarely the largest of them. Here's the line-by-line breakdown we use to give clients a real number — not a marketing number — for what an agent will cost to build and to run for the first 12 months.
These figures are typical mid-range estimates as of early 2026 for a custom agent doing real B2B work — for example a triage agent handling inbound support, an outbound research agent, or an internal-ops assistant. Your actual numbers will vary based on scope, complexity, and existing infrastructure.
Build cost — the part everyone quotes
Engineering effort to design, prototype, harden, integrate, and ship a single production agent end-to-end. Typical range:
- Discovery and design (1-2 weeks): £15,000 - £30,000
- Prototype to production (4-8 weeks of senior engineering): £60,000 - £160,000
- Integration with existing systems (CRM, support tooling, internal apps): £20,000 - £50,000
- Security review and threat modelling: £10,000 - £20,000
- Eval suite and observability stack: £15,000 - £30,000
All-in build for a real production agent: typically £120,000 - £290,000. The wide range reflects how much of the integration surface and security work you already have versus need to build from scratch.
Model and inference spend — the recurring line
What it costs to run the model itself, in tokens or compute. This depends entirely on volume and which model you're calling.
- Light agent (10k inferences / month, frontier model): £200 - £600 / month
- Medium agent (100k inferences / month, mid-tier model): £1,500 - £4,000 / month
- Heavy agent (1M+ inferences / month, optimised mix): £8,000 - £25,000 / month
- Self-hosted equivalent (rented GPUs at sustained utilisation): £6,000 - £20,000 / month at the heavy tier
Two things skew this. First: agents that loop (plan, act, observe, replan) consume 5-20× the tokens of a single-shot model call, because each step has its own prompt and context. Second: retrieval-augmented systems multiply token usage by however many chunks you stuff into context. Both are easy to underestimate.
Infrastructure and tooling — the line nobody mentions
The plumbing the agent needs to actually run in production:
- Vector database (managed): £150 - £2,000 / month depending on scale
- Observability platform (LangSmith, Langfuse, or built-in): £200 - £1,500 / month
- Hosting (the agent's runtime, separate from the model): £100 - £1,000 / month
- Embedding model spend (if not using the same provider): £50 - £500 / month
- Eval running infrastructure: £100 - £400 / month
Typical infrastructure floor: £600 - £5,000 / month even for a modest agent. This grows roughly linearly with usage.
Operational cost — the largest line nobody costs in
An agent in production needs an owner. Someone to triage incidents, review eval failures, update prompts, audit outputs, and respond to the inevitable "it told a customer something weird" Slack ping. Estimating this honestly:
- 0.25 FTE of senior engineering ongoing (~£40,000 / year)
- Plus shared on-call rotation cost — roughly £10,000 / year amortised across systems
- Plus periodic external review (security audits, prompt updates, model upgrades): £15,000 - £30,000 / year
The 0.25 FTE figure surprises people. It shouldn't. Production AI is not fire-and-forget infrastructure — model behaviour drifts, prompts need tuning as the world changes, and adversarial inputs evolve. Budget for the role, even if you split it across multiple agents.
Total 12-month TCO — typical mid-tier B2B agent
Putting it together for a representative case — a mid-volume internal-facing operations agent at 100,000 inferences per month:
- Build: £180,000 (one-off)
- Model and inference: £30,000 / year
- Infrastructure and tooling: £18,000 / year
- Operational FTE allocation: £50,000 / year
- External review and updates: £20,000 / year
- Year-one total: ~£298,000
- Steady-state year-two total: ~£118,000
If your business case can't justify those numbers against the value the agent generates, the answer isn't to find a cheaper vendor — it's to pick a different use case. The TCO floor for production-grade B2B AI is genuinely real and isn't going down soon.
Where teams overpay
- Choosing a frontier model when a mid-tier model would do — typically a 5-10× cost difference for marginal capability gains in routine workflows.
- Letting agents loop without bounds — every reasoning loop multiplies your token bill. Cap the loops.
- Ingesting more context than the model needs to answer the question — retrieval should be tight, not lavish.
- Building bespoke observability when an off-the-shelf product would cover 80% of the need.
Where teams underspend, and regret it
- The eval suite. Always.
- Security review before launch.
- The integration with the actual workflow tool. Slack-bots and standalone dashboards lose traction within months without it.
- The operational owner role. Without one, the agent slowly degrades and nobody notices until a customer complains.
We give clients fully-itemised TCO estimates as part of our discovery phase — based on your actual workflow, volume, and existing infrastructure, not on round-number fantasies. Get in touch if you'd like an honest costing for an agent you're scoping.
