AI governance · Compliance

Why AI governance evidence must exist at decision time, not reconstructed after

Log archaeology is not evidence. When a regulator asks why an action was allowed, the record needs to have existed at the moment of decision—not assembled from scattered systems the week before the examination. The difference is architectural, not procedural.

The examination scenario

A regulator from the FCA's supervisory division sends a letter. They want to understand why your AI-assisted credit decisioning system approved a £2.3 million commercial line at 14:23:07 on March 3rd. The approval touched a customer who, three weeks later, defaulted. You have thirty days to respond.

Your team begins. Someone pulls the CRM record — it shows a credit officer marked the account "reviewed" on March 3rd, but the timestamp is 14:51, not 14:23. Someone else exports tickets from the AI vendor's support portal. Another person recovers email threads between the credit risk team and the platform provider from February. The AI system's own logs show the approval event but not the rule state that permitted it, because the logging configuration at the time captured outputs, not the policy evaluation that produced them.

By day twenty-eight you have a narrative. It is internally consistent. It is almost certainly accurate. But it is a reconstruction — assembled after the fact from systems that were not designed to speak to each other, written by people who are interpreting what they find. This is not an AI audit trail. It is archaeology.

Regulators know the difference. Under SR 11-7, the Federal Reserve's model risk management guidance, model documentation must be maintained on an ongoing basis and must support the ability to "replicate results." The EU AI Act, applicable from August 2026 to high-risk AI systems, requires that logs be kept "for at least six months" and that they capture "the operation of the AI system to an extent appropriate to the purpose of that system." Neither standard is satisfied by a narrative produced under examination pressure.

Why reconstruction is structurally unreliable

The gap is not a people problem. Your team is not negligent. The gap is that the systems involved were changed between March 3rd and the examination date. The AI vendor shipped two platform updates. The credit policy rules were revised in April. One of the analysts who ran the February review has left the firm. The logging infrastructure was migrated to a new provider in May, and the old provider retains compressed archives for ninety days before rotation.

Each of these facts introduces a break in the chain of evidence. When you say "the system approved this because rule R-14 evaluated to true," you cannot show that R-14 existed in its current form on March 3rd. You cannot show that the rule engine running on March 3rd is the same one you are demonstrating today. You have the conclusion without the proof.

Even if every single reconstructed fact is correct, the reconstruction is still a narrative. A narrative is an account of what probably happened. A proof is a demonstration that a specific condition held at a specific moment. In a regulatory examination — particularly under MiFID II transaction reporting obligations, HIPAA audit control requirements, or the EU AI Act's Article 12 logging obligations — examiners increasingly distinguish between these two things. Organisations that have relied on narrative reconstruction are learning this distinction at the worst possible time.

The log is not a control

Most AI governance in production today is observational. A monitoring layer watches the AI system's outputs, flags anomalies, and sends alerts to a human review queue. The assumption is that the monitoring layer will catch non-compliant behaviour quickly enough that it can be remediated before harm accumulates.

This assumption has a structural problem: by the time the monitor detects the violation, the non-compliant state transition has already occurred. The loan was approved. The trade was placed. The claim was denied. The patient record was accessed. Observational governance does not prevent harm — it documents it, with a delay. In a system processing thousands of decisions per hour, a detection latency of even minutes can mean hundreds of non-compliant actions that now need to be unwound, disclosed, or defended.

This is the "log is not a control" problem. A log records what happened. It does not enforce what should happen. Organisations that conflate their audit logging capability with their compliance control are confusing the smoke detector for the sprinkler system. The EU AI Act and the FCA's AI and Machine Learning Discussion Paper both distinguish between monitoring (observational) and governance (enforceable). Examination-ready AI evidence requires the latter, not just the former.

What evidence at decision time looks like

A decision record is not a log entry. It is a structured artefact created at the moment of enforcement — before the action runs — that captures everything needed to demonstrate compliance without reconstruction.

A minimal decision record contains: the action requested; the rule or policy that was evaluated; the system state at the time of evaluation (the data the rule operated on); the outcome of the evaluation; and, where a human was in the loop, a reference to their approval. It is created atomically with the decision. It cannot be created retroactively, because it depends on system state that may not persist.

When these records are stored in a hash-chained audit log, each record contains a cryptographic fingerprint of the record before it. If someone attempts to modify a past decision — to change the rule that applied, or to alter the system state that was recorded — the fingerprint of the modified record no longer matches the fingerprint held by the record that follows it. The chain breaks at the point of modification. Any verifier can detect the tampering without needing to trust the organisation that produced the log.

Applied to the March 3rd scenario: the examiner requests the decision record for action A-2024-03-03T14:23:07. You produce a record created at 14:23:07, containing the exact rule state, the exact policy version, and a fingerprint chain that links it unbroken to the records immediately before and after. You do not need to reconstruct anything. The evidence existed at the moment it was needed.

The difference between a log and a proof

There is a further distinction that matters for regulated AI: the difference between recording that a rule was satisfied and proving that it was satisfied.

A log records what the system reported. If the rule engine had a bug on March 3rd and reported a false positive — recording "rule satisfied" when the rule was not — the log faithfully records the incorrect output. The log is only as trustworthy as the system that produced it.

A formal proof is different. A zero-knowledge proof, for example, is a cryptographic construction that allows a system to demonstrate that a specific condition held — that a borrower's debt-to-income ratio was below the threshold, that a trading instruction was within the mandated position limit — without revealing the underlying data. The proof is mathematically verifiable by any party. It is not a claim. It is a demonstration that survives independent inspection.

In plain terms: the system produces a short piece of mathematics that says "I can prove the rule held, and you can verify that proof yourself, without me showing you the client's financial details." The examiner can verify the proof without the organisation surrendering confidential data. This matters in healthcare and financial services, where the examination right and the data protection obligation exist simultaneously and can be in tension.

ZK proofs are not hypothetical. They are deployed in financial infrastructure today, including in zero-knowledge rollups for transaction settlement and in privacy-preserving compliance checks for AML screening. The architecture is available. The question is whether AI governance systems are being built to use it, or whether they are still relying on narrative reconstruction.

What to ask your AI vendor

If your organisation is deploying AI in a regulated workflow — credit decisioning, claims processing, clinical decision support, trade surveillance — the governance question is not "do you have an audit log?" Almost every vendor will answer yes to that question. The audit log is table stakes, and as noted above, it is not a control.

The question that distinguishes an AI governance regulator-ready system from an observational one is more specific:

"Can you produce a verifiable decision record for any action your agent took, at the moment it was taken, without reconstruction?"

The follow-up questions matter too. Does the record capture the rule version, not just the rule name? Does it capture the system state — the data the rule operated on — at the moment of evaluation, or only the output? Is the record hash-chained so that tampering is detectable? Can you produce a formal proof that the rule held, or only a log entry asserting that it did? How long are records retained, and in what form?

A vendor who cannot answer these questions clearly is running observational governance. That is a product choice that becomes your organisation's examination problem. The FCA, the PRA, FINRA, and the EU AI Act supervisory authorities are not asking whether you monitored your AI system. They are asking whether you governed it.

The architectural implication

Evidence at decision time is not a feature that can be added to an observational system after the fact. It requires that the governance layer sit upstream of the action, not downstream of it. The enforcement point — the place where a rule is evaluated and a decision record is written — must be part of the transaction, not a side-channel observer watching the transaction occur.

This is an architectural choice, not a configuration option. Organisations that make it early have a governance artefact for every decision their AI system has ever taken. Organisations that defer it are accumulating a reconstruction liability — a growing body of decisions for which, if questioned, the best they can produce is a narrative assembled under pressure.

The examination on March 3rd is not a hypothetical. It is the trajectory of AI governance enforcement as regulatory maturity catches up with deployment velocity. The organisations that are ready for it are not the ones with the most sophisticated monitoring dashboards. They are the ones whose AI systems were built to produce proof.