All writingPricing3 June 20269 min read

What does a custom AI agent actually cost? An honest breakdown

A line-by-line breakdown of the real cost of designing, building, and running a custom AI agent for B2B — engineering, model spend, observability, and the operational tax that's missing from every vendor pitch.

When a B2B leader asks us "what does a custom AI agent cost?", the answer they want is a number. The honest answer is: it depends on six things, and the headline engineering cost is rarely the largest of them. Here's the line-by-line breakdown we use to give clients a real number — not a marketing number — for what an agent will cost to build and to run for the first 12 months.

These figures are typical mid-range estimates as of early 2026 for a custom agent doing real B2B work — for example a triage agent handling inbound support, an outbound research agent, or an internal-ops assistant. Your actual numbers will vary based on scope, complexity, and existing infrastructure.

Build cost — the part everyone quotes

Engineering effort to design, prototype, harden, integrate, and ship a single production agent end-to-end. Typical range:

Discovery and design (1-2 weeks): £15,000 - £30,000
Prototype to production (4-8 weeks of senior engineering): £60,000 - £160,000
Integration with existing systems (CRM, support tooling, internal apps): £20,000 - £50,000
Security review and threat modelling: £10,000 - £20,000
Eval suite and observability stack: £15,000 - £30,000

All-in build for a real production agent: typically £120,000 - £290,000. The wide range reflects how much of the integration surface and security work you already have versus need to build from scratch.

Model and inference spend — the recurring line

What it costs to run the model itself, in tokens or compute. This depends entirely on volume and which model you're calling.

Light agent (10k inferences / month, frontier model): £200 - £600 / month
Medium agent (100k inferences / month, mid-tier model): £1,500 - £4,000 / month
Heavy agent (1M+ inferences / month, optimised mix): £8,000 - £25,000 / month
Self-hosted equivalent (rented GPUs at sustained utilisation): £6,000 - £20,000 / month at the heavy tier

Two things skew this. First: agents that loop (plan, act, observe, replan) consume 5-20× the tokens of a single-shot model call, because each step has its own prompt and context. Second: retrieval-augmented systems multiply token usage by however many chunks you stuff into context. Both are easy to underestimate.

Infrastructure and tooling — the line nobody mentions

The plumbing the agent needs to actually run in production:

Vector database (managed): £150 - £2,000 / month depending on scale
Observability platform (LangSmith, Langfuse, or built-in): £200 - £1,500 / month
Hosting (the agent's runtime, separate from the model): £100 - £1,000 / month
Embedding model spend (if not using the same provider): £50 - £500 / month
Eval running infrastructure: £100 - £400 / month

Typical infrastructure floor: £600 - £5,000 / month even for a modest agent. This grows roughly linearly with usage.

Operational cost — the largest line nobody costs in

An agent in production needs an owner. Someone to triage incidents, review eval failures, update prompts, audit outputs, and respond to the inevitable "it told a customer something weird" Slack ping. Estimating this honestly:

0.25 FTE of senior engineering ongoing (~£40,000 / year)
Plus shared on-call rotation cost — roughly £10,000 / year amortised across systems
Plus periodic external review (security audits, prompt updates, model upgrades): £15,000 - £30,000 / year

The 0.25 FTE figure surprises people. It shouldn't. Production AI is not fire-and-forget infrastructure — model behaviour drifts, prompts need tuning as the world changes, and adversarial inputs evolve. Budget for the role, even if you split it across multiple agents.

Total 12-month TCO — typical mid-tier B2B agent

Putting it together for a representative case — a mid-volume internal-facing operations agent at 100,000 inferences per month:

Build: £180,000 (one-off)
Model and inference: £30,000 / year
Infrastructure and tooling: £18,000 / year
Operational FTE allocation: £50,000 / year
External review and updates: £20,000 / year
Year-one total: ~£298,000
Steady-state year-two total: ~£118,000

If your business case can't justify those numbers against the value the agent generates, the answer isn't to find a cheaper vendor — it's to pick a different use case. The TCO floor for production-grade B2B AI is genuinely real and isn't going down soon.

Where teams overpay

Choosing a frontier model when a mid-tier model would do — typically a 5-10× cost difference for marginal capability gains in routine workflows.
Letting agents loop without bounds — every reasoning loop multiplies your token bill. Cap the loops.
Ingesting more context than the model needs to answer the question — retrieval should be tight, not lavish.
Building bespoke observability when an off-the-shelf product would cover 80% of the need.

Where teams underspend, and regret it

The eval suite. Always.
Security review before launch.
The integration with the actual workflow tool. Slack-bots and standalone dashboards lose traction within months without it.
The operational owner role. Without one, the agent slowly degrades and nobody notices until a customer complains.

We give clients fully-itemised TCO estimates as part of our discovery phase — based on your actual workflow, volume, and existing infrastructure, not on round-number fantasies. Get in touch if you'd like an honest costing for an agent you're scoping.

All writingPricing3 June 20269 min read

What does a custom AI agent actually cost? An honest breakdown

Build cost — the part everyone quotes

Engineering effort to design, prototype, harden, integrate, and ship a single production agent end-to-end. Typical range:

Discovery and design (1-2 weeks): £15,000 - £30,000
Prototype to production (4-8 weeks of senior engineering): £60,000 - £160,000
Integration with existing systems (CRM, support tooling, internal apps): £20,000 - £50,000
Security review and threat modelling: £10,000 - £20,000
Eval suite and observability stack: £15,000 - £30,000

Model and inference spend — the recurring line

What it costs to run the model itself, in tokens or compute. This depends entirely on volume and which model you're calling.

Light agent (10k inferences / month, frontier model): £200 - £600 / month
Medium agent (100k inferences / month, mid-tier model): £1,500 - £4,000 / month
Heavy agent (1M+ inferences / month, optimised mix): £8,000 - £25,000 / month
Self-hosted equivalent (rented GPUs at sustained utilisation): £6,000 - £20,000 / month at the heavy tier

Infrastructure and tooling — the line nobody mentions

The plumbing the agent needs to actually run in production:

Vector database (managed): £150 - £2,000 / month depending on scale
Observability platform (LangSmith, Langfuse, or built-in): £200 - £1,500 / month
Hosting (the agent's runtime, separate from the model): £100 - £1,000 / month
Embedding model spend (if not using the same provider): £50 - £500 / month
Eval running infrastructure: £100 - £400 / month

Typical infrastructure floor: £600 - £5,000 / month even for a modest agent. This grows roughly linearly with usage.

Operational cost — the largest line nobody costs in

0.25 FTE of senior engineering ongoing (~£40,000 / year)
Plus shared on-call rotation cost — roughly £10,000 / year amortised across systems
Plus periodic external review (security audits, prompt updates, model upgrades): £15,000 - £30,000 / year

Total 12-month TCO — typical mid-tier B2B agent

Putting it together for a representative case — a mid-volume internal-facing operations agent at 100,000 inferences per month:

Build: £180,000 (one-off)
Model and inference: £30,000 / year
Infrastructure and tooling: £18,000 / year
Operational FTE allocation: £50,000 / year
External review and updates: £20,000 / year
Year-one total: ~£298,000
Steady-state year-two total: ~£118,000

Where teams overpay

Choosing a frontier model when a mid-tier model would do — typically a 5-10× cost difference for marginal capability gains in routine workflows.
Letting agents loop without bounds — every reasoning loop multiplies your token bill. Cap the loops.
Ingesting more context than the model needs to answer the question — retrieval should be tight, not lavish.
Building bespoke observability when an off-the-shelf product would cover 80% of the need.

Where teams underspend, and regret it

The eval suite. Always.
Security review before launch.
The integration with the actual workflow tool. Slack-bots and standalone dashboards lose traction within months without it.
The operational owner role. Without one, the agent slowly degrades and nobody notices until a customer complains.

What does a custom AI agent actually cost? An honest breakdown

Build cost — the part everyone quotes

Model and inference spend — the recurring line

Infrastructure and tooling — the line nobody mentions

Operational cost — the largest line nobody costs in

Total 12-month TCO — typical mid-tier B2B agent

Where teams overpay

Where teams underspend, and regret it

Other field notes

How to prevent prompt injection: defense patterns that actually work

How to deploy AI in production safely: a B2B leader's checklist

If this is useful, the consultancy is more so.

What does a custom AI agent actually cost? An honest breakdown

Build cost — the part everyone quotes

Model and inference spend — the recurring line

Infrastructure and tooling — the line nobody mentions

Operational cost — the largest line nobody costs in

Total 12-month TCO — typical mid-tier B2B agent

Where teams overpay

Where teams underspend, and regret it

Other field notes

How to prevent prompt injection: defense patterns that actually work

How to deploy AI in production safely: a B2B leader's checklist

If this is useful, the consultancy is more so.