Why it matters | Riley Betts

This page documents three failure modes in AI governance frameworks as currently deployed, and five positions where Riley Betts's architecture diverges from mainstream consensus — with the reasoning behind each.

Why this matters now

The cost of getting this wrong is not theoretical.

Failure Mode 01

The 0.1% problem

An AI model that follows compliance rules 99.9% of the time violates them 0.1% of the time. In a high-volume regulated workflow—thousands of transactions per day—that is not a rounding error. That is a compliance programme failure. A regulatory breach. Potentially a criminal matter. Probability is not a defence.

Failure Mode 02

Prompt injection is not a solved problem

An AI agent operating in a live enterprise environment receives input from many sources: user messages, API responses, document contents, other agents. Any of these can contain adversarial instructions. Guardrails and defensive prompting are a perimeter, and perimeters get breached. The only robust defence is architectural separation. The agent literally cannot take the harmful action, regardless of what it is told.

Failure Mode 03

A log is not a control

Most "AI governance" in production is observational: log the inputs, log the outputs, have someone review them. This is not governance. It is forensics. By the time you have detected the problem, the non-compliant state transition has already occurred. Real governance sits upstream of execution. It is the validator that decides whether the action is permitted, not the team that reviews it afterwards.

Consensus vs. our position

Where we disagree with the mainstream.

Consensus

Better prompts and fine-tuned models will make AI agents safe enough for regulated use.

Riley Betts

Probabilistic systems cannot make deterministic guarantees. No amount of prompt engineering changes this. You can reduce the probability of failure. You cannot eliminate it. In regulated systems, that is not acceptable. The answer is architectural separation, not better persuasion.

Consensus

Human-in-the-loop is the responsible way to deploy agentic AI in regulated environments.

Riley Betts

Most HITL implementations are reassurance theatre. A human approving an AI-proposed action they cannot validate in the time given, against data they cannot fully process, is not a control. It is ceremony. If the action cannot be validated at runtime, the architecture is wrong. Fix the architecture.

Consensus

AI code generation reduces the need for rigorous software engineering discipline.

Riley Betts

AI code generation demands more rigorous engineering, not less. Slop in, slop out. The constraint moves upstream: from writing correct code to specifying correct behaviour. Organisations that invest in formal specification now will generate demonstrably correct systems at 5–10× the speed of those still relying on iterative debugging.

Consensus

AI governance is primarily a data and model management problem.

Riley Betts

Agentic AI governance is an execution problem. Existing frameworks were designed for models that produce outputs. Agents take actions. The risk is not in the model's response, it is in what the agent does next. Governance must sit upstream of execution, as an architectural constraint, not downstream as observation and review.

Consensus

Explainable AI means you can trace how the model reached its conclusion.

Riley Betts

Explainability of the model is not the same as auditability of the action. When a regulator asks whether a specific compliance check was performed on a specific action at a specific time, the answer must be a cryptographically verifiable, hash-chained record generated in real time—not a narrative reconstructed from logs. Those are not equivalent.

What this architecture means for your role.

AI-assisted delivery is changing org design, and not only tooling. If you are asking “If the machine writes more of the product, what is left for me?”, that question is reasonable.

Spec++ does not make software engineers redundant. It moves the most valuable work upstream: from typing implementation to owning the behaviour the organisation is prepared to guarantee.

Instead of reverse-engineering requirements and debugging edge cases in scattered services, you capture rules of the system precisely enough that generated code has almost nowhere to go wrong. You move from owning files and services to owning:

which states are allowed
which transitions are legal
which guarantees the firm can make to regulators and customers

That is a more senior, more accountable position, not a smaller one. AI turns specifications into working systems at speed; your leverage is the clarity and rigour of the spec, not typing faster.

Specification stewardship—between “everyone prompts in chat” and “senior engineers own everything” is a concrete, learnable craft via specification-driven development (SDD) with Spec++:

Background	What you bring	What SDD + Spec++ adds
Product	Outcomes, acceptance language, edge cases	Own NL clauses; review clause-level PR diffs before code lands
Junior developers	Vertical slices, tests, detail	Maintain specs, run pipeline in CI, triage lift and check failures
QA	Traceability, regression instinct	“Does every obligation still hold?”—trace clauses to source, gate on spec pipeline green

Spec++ Pipeline targets the gap teams feel when they shrink traditional intake but still need fewer people who can keep intent coherent as generation accelerates.

Your clarity becomes the control plane. Learn to operate the spec pipeline, not compete with the agent on typing.

Learning to operate Spec++ and the pipeline is a portfolio skill: visible in the repo (specs/ commits, green Actions runs), legible to engineering leads, aligned with audit demands that chat logs alone cannot satisfy.

Pair on a Spec++ SDD build → SDD adoption training path → Read the Spec++ explainer →