Spec-Driven Development

Turn AI Speed Into Enterprise Delivery

It all runs on one approved artifact - the spec: the requirements and acceptance criteria every AI agent builds from, and every pull request is checked against.

AI made writing code fast. It didn't make enterprise delivery fast - it relocated the constraint. Here's where the bottleneck went, why your tools can't fix it, and what an AI-native operating layer does instead.

22 Jun 2026QCerris Team

The expectation gap

The throughput promise didn't show up

Every board is pressing for AI ROI, and most leadership teams have already bought the tools. The promise was a step-change in delivery velocity - 5× to 10× acceleration of every engineer, leading to similarly faster time-to-product.

The reality inside most enterprises looks different: more tokens, more experiments, more AI-generated code - and no step-change in what actually ships. Executives see activity. They do not yet see a repeatable delivery engine. AI activity went up; enterprise delivery did not. The problem was never whether AI matters - it's how to convert AI into enterprise throughput.

Where it went

Writing code was never the bottleneck

This is the uncomfortable part. Faster typing was never the real constraint on enterprise delivery. So when AI compressed the time it takes to write code, the bottleneck didn't disappear - it moved, in two directions at once.

Downstream, to code review. When one engineer can ship five times the code in a day, every reviewer becomes the chokepoint - asked to vouch for code no human fully authored, at a volume no review process was designed to absorb.

Upstream, to requirements. AI is a faithful amplifier. Feed it a vague ticket and it will confidently build the wrong thing - fast, polished, and wrong. The cost of an unclear requirement used to be caught by a developer who stopped to ask. Agents don't stop to ask.

The bottleneck moved - it didn't disappear

Before AI

Requirements

Define

Coding

Build · code

Code review

Verify

The crowd doesn't shrink - it moves to the ends: requirements & review.

AI shrank the code. The work around it exploded - into defining and verifying.

Why tools alone won't fix it

Licenses are not a strategy

The instinct is to fix this by buying more tools. But coding assistants - Claude Code, Cursor, Copilot, Codex - optimize the individual coding session, not the enterprise delivery system. Powerful as they are, there are four things they will never do:

Claude Code · Cursor

Won't define what to build

They execute intent. They don't author or approve it.

Copilot · Codex

Won't coordinate approvals

No cross-functional sign-off across product, QA, architecture.

AI IDE plugins

Won't enforce architecture

No guarantee the work respects your ADRs or standards.

PR assistants

Won't validate vs. intent

They check the diff in isolation, not against what was asked.

Giving every engineer an AI assistant is not the same as redesigning the SDLC for AI.

Consistency at scale

Enterprise level SDLC needs AI guardrails. Even more so then individual developers and small teams

A big company doesn't have one developer - it has tens, hundreds or even thousands, each with their own habits, and the AI era made the need for consistent SDLC process even more important. Everyone reaches for different tools, prompts in their own style, and documents to their own standard. The result is too much material, too many half-aligned artifacts, uneven leadership from team to team, and acceptance criteria that shift from person to person. Quality quietly becomes a function of who happened to pick up the ticket.

At enterprise scale, that inconsistency is the real risk. Ten engineers each "doing it their way" with an AI agent doesn't add up to throughput - it adds up to drift, rework, and a codebase no single person fully understands.

This is exactly where a large organization needs one operating layer the whole company runs on: one place where intent is defined, one standard for what "done" means, one review bar, one audit trail. Not to box engineers in - but so that product, engineering, QA and leadership all work from the same definition of the work. One tool, one standard, everyone on the same page.

The missing layer

The AI-native SDLC has a missing operating layer

Your stack already has the tools. What it doesn't have is the connective tissue between them. You already run Jira, Linear, GitHub, GitLab, Cursor, Claude, Codex, Copilot, PR review and engineering analytics. What's missing is the operating layer between business intent and autonomous implementation - approved intent, context delivery to agents, intent-aware review, an audit trail, and executive metrics.

What's missing isn't a tool - it's the layer between them

Business intent · system of record

JiraLinearServiceNow

↑ ↓

The missing operating layer

CodeMerlin

Approved intent · context delivery to agents · intent-aware review · audit trail · executive metrics

↑ ↓

Autonomous implementation · your SCM

CursorClaudeCodexCopilotGitHubGitLab

Orchestration, governance and measurement - a control tower across the AI-native SDLC.

The control tower

CodeMerlin is that layer

CodeMerlin is the enterprise control tower for AI software delivery. It attaches to the stack you already run - Jira, Linear, GitHub, GitLab, Cursor, Claude, Codex, Slack, Teams - and adds the AI-native SDLC layer on top: specs and tech plans, task and test plans, MCP context delivery to agents, intent-aware PR review, and an executive control tower. It doesn't replace your stack. It adds the missing operating layer above it.

Every other tool checks how code is written. CodeMerlin checks if it does what was asked.

The core of CodeMerlin

The spec is the source of truth

Not the prompt, not the chat history - the spec. It's the one approved artifact every downstream step runs on.

Start with the terminology, because it matters. In CodeMerlin, a ticket becomes one spec. That spec holds the requirements, use cases and acceptance criteria for the work - and any open questions a human needs to settle before code is written. Not many specs per ticket: one spec, with requirements inside it.

And it isn't a static document the AI hands back. CodeMerlin reads the ticket and your context, then surfaces what's ambiguous as open questions, right at the top:

SPEC · Enterprise SSOIn review

Open questions 4

Must answerWhich identity providers must we support at launch?

Okta + Azure ADOkta onlyAny SAML 2.0 IdP

Must answerIs SCIM user provisioning in scope for v1?

OptionalDefault session timeout?

OptionalCustom branding on the login screen?

Requirements & acceptance criteria

Users in SSO-enabled domains authenticate via the configured IdP; local passwords are disabled.
Failed authentications are logged and surfaced to SecOps within the audit stream.
Admins can enforce SSO per workspace without a redeploy.

DK Reviewed by Dan K.2 edits · approved ✓

One ticket → one spec. Open questions on top, requirements below, every change traceable to a person.

The layout does three things on purpose. Must-answer questions block approval; lower-priority ones don't - so teams resolve what actually matters and keep moving. Each question links to the exact part of the spec it affects, so no one scrolls a massive document hunting for context. And answering can be as simple as picking an option.

Then a person approves it - accept everything, or edit any requirement - and every action is recorded: who reviewed, who changed what, and when. That audit trail is what turns "the AI wrote some code" into governed delivery you can stand behind.

One ticket, one approved spec - reviewed by a person, traceable to a name.

Once it's approved, that single spec is the source of truth everything downstream runs on: the tech plan, tasks and test plan are generated from it, the agents build against it through MCP, and every pull request is reviewed back against it.

How it works

From a vague ticket to verified delivery

CodeMerlin is a spec-driven development platform. It covers the full lifecycle - from a potentially vague ticket to a verified delivery - with a human gate at every decision that matters, and traceability from requirement to shipped code.

The golden thread, end to end

AI step

Ticket ready

A Jira or Linear item - even a vague one - enters CodeMerlin.

AI step

Spec generated

A structured Spec is authored from the ticket - grounded in your code, docs, ADRs and prior specs, with assumptions, citations and open questions.

Human gate

Product approves the Spec

Intent is reviewed and locked once. Human-in-the-loop on what to build.

AI step

Plans authored

Technical Plan, Task Breakdown and Test Plan - each tied to the approved Spec with traceability links and machine-verifiable Done conditions.

Human gate

Tech & QA approve

Execution context is signed off, with implementation-readiness checks before anything is built.

AI step

Agents receive context

Approved context is delivered to Cursor, Claude Code and Codex via the MCP server. Agents don't guess - they work from an authoritative source.

AI step

Review by intent

Each PR is validated against the Spec, ADRs and Test Plan - not just style and syntax. Drift is caught in the IDE, not only at PR time.

AI step

Leadership sees outcomes

Intent delivery, requirement gaps, rework and governance - measured across teams and repositories.

Jira and Linear stay your system of record. GitHub and GitLab stay your SCM. Cursor and Claude stay your coding environment.

INTENT IN

Approved spec

Intent is captured and locked before agents touch code.

→

CONTEXT OUT

Authoritative source

Approved context flows to the agents, not a guess.

→

OUTCOME BACK

Verified delivery

Every PR is checked against the original intent.

↺ intent in → approved context out → verified outcome back

Spec-to-code traceability: a clean line from what the business asked for to what actually ships.

Solo vs. enterprise

Solo AI development is real. Enterprise AI development is different.

For a solo builder, coordination cost is near zero - one head holds the product intent, the architecture decisions, and the history. There are no approval chains and no legacy entanglement. That's why solo AI demos look magical.

In an enterprise, coordination is the cost. No single head holds it all. There are legacy systems and integration debt, compliance, security and audit obligations, product, QA and architecture reviews, and organizational memory spread across teams. The solo AI engineer model breaks the moment the work requires that organizational memory.

The real shape of the work

Enterprise software is a system, not just code

In an enterprise, software is not just code - code is the smallest layer. Above it sit acceptance criteria, technical and test planning, architecture and ADRs, security and compliance, and release and operational accountability. AI can accelerate every one of those layers - but only with the right context and the right control points.

The bottleneck moved from writing code to defining, coordinating, and verifying the work.

Why it unlocks velocity

AI velocity comes from removing coordination drag

The win isn't faster typing. It's removing the loops that kill enterprise throughput - the requirement gaps, the rework cycles, the architecture surprises that only surface in the postmortem.

Fewer requirement gaps

The Spec is approved before agents touch code.

Fewer rework loops

Intent-aware review catches drift early, not after merge.

Earlier architecture alignment

ADRs feed the Tech Plan - not the postmortem.

Better test coverage

The Test Plan is authored alongside the Spec.

Faster onboarding

New engineers ramp on context, not tribal knowledge.

More parallel execution

Approved context is explicit and reusable across teams.

The spec is what unlocks parallelism. The governance loop is what makes it safe.

What organizations get

A measurable operating model for AI delivery

CodeMerlin gives every level of the organization something it can act on - not just dashboards to observe, but levers to manage AI software delivery.

CEO / Board

A credible AI plan, measurable productivity, reduced execution risk.

CTO / VP Eng

Governed agent usage, architecture compliance, less drift.

CPO / Product

Clearer intent, less translation loss, fewer requirement misses.

COO / PMO

A repeatable process, delivery visibility, lower rework.

CIO / CISO

Auditability, data boundaries, model routing and controls.

PE Operating Partner

A scalable portfolio playbook for AI-driven throughput.

Far more than another code-review tool

96%

Delivery quality

100%

PR traceability

94%

Architecture coverage

Five throughput metrics: requirement gaps caught pre-code · review cycles per merged PR · post-merge rework rate · cycle time from ticket to production · intent delivery, spec ↔ shipped.

The strategic choice

Enterprise AI velocity needs an operating layer

There are two ways to respond to the AI moment. One scales individual leverage and hopes it adds up. The other redesigns the system the work flows through.

Buy licenses only

Scattered productivity

Individual leverage. No portfolio playbook. No measurable AI plan.

Redesign the SDLC for AI

Durable throughput

A governed operating layer. Approved intent. Verified delivery. Measured velocity.

Move faster with AI. Keep control with CodeMerlin.

See how CodeMerlin works →