Pre-Mortem: The EU AI Act’s Accountability Gap

Posted on July 14, 2026 by Andrew Scott

On 2 August 2026, the EU AI Act gives the EU AI Office the power to fine the developers of general-purpose AI models up to three per cent of global annual turnover, demand documentation, and commission independent access to source code. Three weeks before that date, the high-risk AI compliance deadline moved from August 2026 to December 2027, enacted as binding law on 29 June. The two facts share a date. They do not share a plan.

This is the ninth piece in the Pre-Mortem series. Five questions, applied to the public record, before a programme has had the chance to succeed or fail.

The Bet

The EU is betting that extending the deadline for high-risk AI compliance by 16 months, agreed in May 2026 and enacted on 29 June, produces better enforcement outcomes than a met deadline inside a half-prepared enforcement architecture. The logic holds. As of August 2026, only nine of 27 member states have shown advanced public implementation of enforcement infrastructure. Germany has designated the Bundesnetzagentur as its market surveillance authority and adopted draft transposition legislation. Spain built the AESIA, a dedicated supervisory agency, from scratch. Ireland deployed fifteen coordinated authorities under a central National AI Office. Those are genuine structural commitments. Eighteen member states have not reached that point. The extension gives them time. Whether they use it is the bet.

The Assumption

The entire framework rests on this: that national competent authorities, operating under 27 different legal frameworks, will converge on consistent enforcement before December 2027. The AI Act is a directly applicable regulation. Its enforcement infrastructure is not. The regulation sets the rules uniformly across the bloc. The authorities responsible for applying them have been built at very different speeds, under very different political conditions. That divergence is the risk the extension is buying time to close. There is no public commitment that the time is sufficient.

The Sequence

The AI Act entered into force in August 2024. Member states were required to designate their national competent authorities by August 2025. At least twelve missed that deadline. Seven months later, in May 2026, the Council and Parliament agreed to simplify the rules as part of the Digital Omnibus package. On 29 June, the high-risk AI deadline moved. What remains in force on 2 August is a narrower set: general-purpose AI model obligations and transparency requirements for new deployments. The high-risk AI rules, the Act’s original centre of gravity, are no longer in that set. Governance was adjusted to fit the readiness gap. That is not the order in which enforcement architecture is supposed to be built.

The Pager

Lucilla Sioli, Director of the EU AI Office, carries accountability for general-purpose AI enforcement from 2 August. For high-risk AI systems, including credit-scoring models, recruitment tools, and systems used in border control, healthcare, and law enforcement, accountability rests with national competent authorities. In 17 of 27 member states, no public designation exists. The Act names the category. Seventeen member states have yet to name the person.

The Proof

The measure that would settle this in 2028 is year-one enforcement consistency: the share of member states that have conducted at least one formal high-risk AI enforcement action, under the same evidentiary standard, in the first twelve months after the December 2027 deadline. No EU institution has publicly committed to publishing that figure. The AI Office’s annual progress reporting is the closest mechanism on the public record. It tracks activity. No published mechanism commits to measuring whether enforcement actions are consistent across member states.

Verdict

If the Commission designates a public accountability owner in each member state before December 2026 and commits to publishing year-one enforcement data by name, the 16-month extension holds up as a governance decision made under realistic conditions. Without that, a framework that took two years to reach enforcement hands itself an extension with nobody carrying it.

Pre-Mortem: Eight Companies, No Published Accountability Standard

Posted on July 7, 2026 by Andrew Scott

The Pre-Mortem is a weekly series on this blog. Each piece applies five questions to a major technology commitment before the outcome is known.

In February 2026, the United States Department of War signed agreements with eight of the world’s leading artificial intelligence companies, OpenAI, Google, Microsoft, SpaceX, Oracle, Amazon Web Services, NVIDIA, and Reflection, to deploy their advanced AI models inside its classified networks. Impact Level 6 (IL6) covers data classified at the Secret level. Impact Level 7 (IL7) covers compartmented intelligence and the most sensitive operational systems, where the United States military runs its actual warfighting decision support. This is the first time that large language models have operated within IL7 environments. What has not been published is who carries accountability when one of them gets something wrong.

The Bet

The Department of War’s stated aim is to establish the United States military as an AI-first fighting force, achieving what its AI Acceleration Strategy calls decision superiority across all domains of warfare. The eight agreements are the mechanism. The AI systems will summarise surveillance feeds, synthesise intelligence data, and suggest tactical options to human operators. The Department of War’s five AI ethics principles, responsible, equitable, traceable, reliable, and governable, are on the record. The bet is that those principles are sufficient architecture for what happens inside a classified environment.

The Assumption

The whole bet turns on this: that “humans remain accountable for AI outcomes” as a stated principle is equivalent to a published accountability framework.

That distinction is where there is a gap. The Department of War’s Responsible AI Strategy and Implementation Pathway establishes process. It does not name the specific individual, command role, or governance layer accountable when an AI-assisted intelligence summary inside an IL7 environment shapes a decision that turns out to be wrong. Principle and framework are not the same thing, and in a classified environment that distinction cannot be tested publicly.

The Sequence

In July 2025, Anthropic’s Claude became the first frontier AI model approved for use on classified networks. The Pentagon subsequently sought to renegotiate those terms, demanding Anthropic permit its models to be used for all lawful purposes without limitation. Anthropic declined, citing concerns about mass domestic surveillance and autonomous weapons. On 27 February 2026, President Trump ordered all federal agencies to stop using Anthropic. The following day, OpenAI signed its classified deal with commitments that included prohibitions on domestic mass surveillance and human responsibility for the use of force, positions that aligned with the guardrails Anthropic had sought to retain. By May 2026, the remaining seven of the eight, Google, Microsoft, SpaceX, Oracle, Amazon Web Services, NVIDIA, and Reflection, had signed equivalent agreements.

The sequence reveals something structural. The accountability architecture for classified military AI was settled by commercial negotiation and political designation, not by a published governance framework.

The Pager

Legal scholars on autonomous weapons identify the same accountability fracture that applies in the decision-support context here. When an AI-assisted output causes harm in a classified environment, accountability distributes: software developers could not have anticipated all operational contexts, commanding officers disclaim responsibility for machine-generated outputs, vendors invoke contractual limitation of liability. The human-in-the-loop design means a person reviews AI suggestions before acting. It does not mean accountability for acting on a wrong AI output has been named anywhere in the command chain.

No published document names the specific individual role, command layer, or governance body accountable for a wrong AI-assisted output inside an IL7 environment. No congressional oversight mechanism covers classified operational AI use. No published error reporting standard exists. By the nature of classified operations, none can.

The Proof

Eight companies, the highest classification levels, large language models operating on top-secret data for the first time: the scale of the commitment is confirmed. The outcome data will not follow. Classified operational AI performance is not publicly reviewed, by design. This is the only deployment in this series where the proof question cannot be answered from the outside, not because the data is not collected, but because it cannot be published.

The accountability question is not whether humans are in the loop. They are, by stated commitment. The question is whether the framework for who carries it specifically, when they get something wrong, inside a system that cannot publish what it got wrong, exists in any enforceable form.

The Verdict

If the Department of War’s five principles are operationalised into a named, enforceable command accountability chain for AI-assisted decisions at every classification level, if the commercial guardrails in all eight agreements are independently verifiable by a body with appropriate clearance, and if a congressional oversight mechanism specific to classified AI operational failure is established, then this is what responsible military AI deployment at scale should look like.

Without all three, eight of the most powerful AI systems on earth are running inside the most classified networks in the world. The decisions they shape will not be publicly reviewed. The wrong ones will not be counted.

The accountability is a principle. The framework has not been built yet.

Pre-Mortem: Apple Intelligence at Work

Posted on June 30, 2026 by Andrew Scott

The Pre-Mortem is a weekly series on this blog. Each piece applies five questions to a major technology commitment before the outcome is known.

On 9 June 2026, Apple used its annual developer conference to announce that Siri had become something different. Not a smarter assistant. An agentic AI layer that could take actions across applications, services, and workplace workflows on behalf of its users, across a hardware ecosystem of more than 2.5 billion active devices. The world’s most valuable company had turned its operating system into an AI agent. The question the keynote did not answer was straightforward: when it gets something wrong at work, who is responsible?

The Bet

Apple is betting that privacy and accountability are the same problem. Its Private Cloud Compute architecture is genuinely novel: stateless, ephemeral, cryptographically auditable, with production builds published within 90 days for independent inspection. At WWDC 2026, Craig Federighi stated: “data is only used to execute your request, and outside experts can continue to verify this promise at any time.” The claim is that if Apple cannot read your data, no one can. What this architecture was not designed to answer is what happens when Apple Intelligence takes a workplace action on your behalf and gets it wrong. That is a different question. Apple has framed the privacy answer as if it covers both.

The Assumption

Everything turns on one distinction: that an architecture designed to prove Apple cannot access your data also constitutes a framework for enterprise accountability when AI actions produce incorrect outcomes.

It does not. Privacy means Apple is not the party reading your data. Accountability means someone is responsible for what the AI produces from it. Those are different obligations. No document currently published by Apple closes the gap between them. The existing AppleCare for Enterprise terms explicitly disclaim liability for lost profits, damage, corruption, or loss of data, or interruption of business. There is no AI-specific carve-out, no enterprise service level agreement for Apple Intelligence outputs, and no accuracy standard committed to publicly.

The Sequence

Three weeks before WWDC 2026, Apple settled a $250 million class action over Siri AI features it had promoted during the iPhone 16 launch but did not deliver. The settlement included no admission of wrongdoing. In April 2026, Apple’s CEO Tim Cook announced his departure from the role, with John Ternus, the head of hardware engineering, confirmed as his successor from September 1, 2026. Ternus had no publicly stated role in shaping Apple Intelligence. At WWDC 2026, enterprise MDM controls for Apple Intelligence were available in beta only, with general availability expected in autumn 2026. The agentic deployment was announced. The governance controls that enterprises need to deploy it responsibly were not yet generally available.

The Pager

Craig Federighi, Senior Vice President of Software Engineering, is the named face of Apple Intelligence. Amar Subramanya, Vice President of AI, is the operational lead, reporting to Federighi since the retirement of John Giannandrea earlier this year. Neither has made any public commitment regarding enterprise accountability for AI outputs. By September 2026, John Ternus will carry the CEO accountability for a deployment he did not architect, operating under governance terms that were written before agentic AI was part of the product. No named individual or governance body is publicly committed to what Apple Intelligence does in enterprise workflows when it goes wrong.

The Proof

Apple has published no enterprise outcome measure for Apple Intelligence. No accuracy benchmark, no error rate commitment, no service level agreement for business customers. The company’s transparency commitments for Private Cloud Compute are real: production code published within 90 days, a cryptographically auditable log, a virtual research environment for security testing. These are privacy verification mechanisms, not performance standards. A survey of approximately 100 enterprise IT administrators published in May 2026 found that the primary concern was data exfiltration to unmanaged providers, and that eight per cent of organisations had already moved to prohibit AI features entirely. No one at Apple has publicly committed to a measure that would settle that question.

The Verdict

Apple has done more than most technology companies to make its cloud AI architecture independently verifiable. Private Cloud Compute is a credible attempt to resolve the privacy half of the enterprise AI problem. The accountability half remains open. If Apple publishes enterprise terms that define who carries responsibility for agentic errors in business workflows, and if John Ternus names a specific accountable owner for enterprise AI governance before the full iOS 27 rollout, the MDM controls announced at WWDC 2026 become the foundation of something credible. Without both, the hundreds of millions of Apple Intelligence-enabled devices deployed into enterprise settings are operating on a privacy promise. That is not the same thing as an accountability framework.

Pre-Mortem: KPMG’s AI-Powered Audit

Posted on June 17, 2026 by Andrew Scott

The audit opinion is the most consequential document most public companies produce. Not the annual report. Not the investor deck. The audit opinion, because it carries a named partner’s signature, and because that signature means something in law. On 9 June 2026, KPMG and Microsoft announced the deployment of Microsoft Agent 365 and Copilot across 276,000 KPMG professionals in 138 countries, including inside KPMG Clara, the firm’s global smart audit platform. Scott Flynn, KPMG’s Global Head of Audit, called it “a pivotal milestone in our AI-powered, human assured audit transformation.” The word “assured” is doing a great deal of work in that sentence.

A pre-mortem asks the same five questions, every time, applied before failure is possible rather than after. This is the fifth in the series. The first looked at vendor accountability in regulated finance. The second at clinical safety in healthcare. The third at execution accountability in defence procurement. The fourth at clinical AI infrastructure. This one looks at professional services, the sector that has built its entire business model on the premise that human expertise is the product.

The Bet

KPMG is betting that efficiency and accountability can coexist at this scale. That 276,000 professionals deploying AI agents, with a governance layer running underneath, will not dilute the professional accountability the audit opinion rests on. It is a reasonable bet. It is also an untested one. The commercial logic is clear: 276,000 professionals, 138 countries, and an AI-powered workflow running through KPMG Clara creates the kind of structural productivity gain that redefines the firm’s cost base, and potentially its fee model. Analysis of recent audit fee movements suggests clients are already pressing the case that AI efficiency should flow through to lower fees. The deeper bet, the one sitting beneath the headline deployment, is that “AI-powered, human-assured” constitutes a defensible operating model before any regulatory body has defined what “human-assured” actually requires in practice.

The Assumption

The single assumption carrying all the weight: that governing agents is the same thing as being accountable for them. Microsoft Agent 365 provides what its own documentation describes as a control plane, a centralised registry of agents with lifecycle rules, identity controls, and audit logging. That is a meaningful capability. It answers the question: how many agents do you have, and what can they touch? It does not, on its own, answer the question a claims lawyer or a regulator will eventually ask: who is accountable when the agent was visible, governed, and still wrong? KPMG’s Trusted AI framework lists ten ethical pillars, including one labelled Accountability, which calls for human oversight and responsibility to be embedded across the AI lifecycle. That is a principle-level commitment. None of the publicly available documentation specifies what happens to the partner’s signature when an AI-assisted conclusion is signed off and later found to be materially incorrect.

The Sequence

KPMG has deployed agents at scale before any authoritative regulatory framework specifies what AI-assisted audit evidence must look like, or how human review of AI-generated conclusions must be documented to meet existing standards. The IAASB approved a project proposal in March 2026 to revise ISA 500, Audit Evidence, to address technology use in audit, but the project is still in early research and information gathering, with no exposure draft issued and no effective date. The PCAOB has stated publicly that it is considering developing risk management guidance for audit firms using AI. Considering, not publishing. The capability is deployed. The standard that surrounds it is still being drafted.

The Pager

Lisa Heneghan, KPMG’s Global Chief Digital Officer, was specific about what this deployment requires: “strong foundations in governance, visibility and accountability.” That framing is responsible, and Agent 365 provides the visibility that most enterprises currently lack. The harder question is structural and specific. The audit opinion is signed by a named partner. Professional indemnity is priced around that signature. When an agent embedded in KPMG Clara surfaces a conclusion, the partner reviews it, signs the opinion, and the work later contains a material error, the liability has historically sat with the partner and the firm. What KPMG, Microsoft, and the client have not yet published is a clear allocation of responsibility for the agent’s contribution to that error. Is it a tool failure, an oversight failure, or something existing frameworks do not yet classify? The governance layer provides the audit trail. It does not specify who reads it, or what reading it is worth, when a claim is filed.

The Proof

The announcement commits 276,000 professionals and earns KPMG the designation of Microsoft “Frontier Firm.” Neither is a performance measure. No published metric connects this deployment to audit accuracy improvement, reduction in deficiencies, or quality outcomes. What the deployment actually demonstrates is that KPMG can deploy Agent 365 at scale and maintain visibility over its agent estate. That is a meaningful operational achievement. It is not the same as demonstrating that AI-assisted audit conclusions are more reliable than human-only ones, which is what regulators, courts, and insurers will eventually need to see. KPMG Clara’s existing framing covers adoption and workflow integration. No published figure connects it to audit opinion accuracy or deficiency rates. The proof that matters most is still outstanding.

Verdict

If KPMG publishes a clear framework specifying how AI-assisted audit evidence is reviewed, validated, and documented, paired with a liability position that survives regulatory scrutiny, this becomes the reference model for professional services AI at scale. The governance commitment is genuine. The scale of deployment is unmatched in the sector. Scott Flynn’s “AI-powered, human-assured” is the right aspiration. The question is whether “human-assured” describes a documented, auditable review process that a regulator will accept and an insurer will cover, or whether it is a positioning statement waiting for a definition. At 276,000 professionals across 138 countries, the audit opinion at the centre of this deployment is too consequential to leave that question open. The answer should come before the first material claim, not after.

Already Building: Epic Agent Factory and the Governance Gap

Posted on June 14, 2026 by Andrew Scott

The pre-mortem on Epic Agent Factory asked who would answer when a health-system-built agent made a clinically significant error. It published on 9 June. I have since learned of a Becker’s Hospital Review report from 30 March confirming that one of America’s largest health systems had already been building those agents for weeks before the question was published.

It confirms the pre-mortem’s central argument. Neither the research nor the article surfaced how quickly the sequence had already begun.

The Deployment That Was Already In Motion

Advocate Health had already tapped Epic’s Agent Factory, becoming one of the first health systems to build and deploy agents through the platform. Andy Crowder, Advocate Health’s SVP and Chief Digital and AI Officer, described the direction in a LinkedIn post on 26 March: “By combining Epic’s Agent Factory Platform capabilities with Advocate Health’s scale, clinical insight, and commitment to innovation, we’re translating AI from promise into practice.” He pointed to a three-day Epic immersion at The Pearl innovation district in Charlotte, focused on speeding up pharmacy verification for complex medications and cutting infusion chart preparation time for pharmacists and nurses. Four working prototypes emerged, scheduled to go live in July 2026.

Crowder added: “Together, we’re advancing responsible, practical AI that fits naturally into clinical workflows, reduces friction, and gives clinicians back time to focus on what matters most.” It is a considered statement, and the commitment is genuine. But it is not a governance document. And Advocate Health is not unusual here. They are representative. They moved first because the platform enabled it, the commercial pressure to reduce administrative burden was real, and nothing in the regulatory landscape said stop.

This is the sequence the pre-mortem described. Capability arrived. Deployment followed. The governance architecture to surround it had not been ratified.

The Workflows That Come Next

Pharmacy verification and infusion chart preparation are not, in themselves, clinical decision-making. They reduce documentation burden and carry genuine operational value. But they are the entry point, not the ceiling.

Epic’s own Penny agent already handles prior authorisation for thousands of health systems. Agent Factory is the platform through which health systems build their own versions of exactly those capabilities. Prior authorisation sits at the intersection of clinical judgment and payer approval. An AI-generated argument that misrepresents a contraindication, omits a relevant diagnosis, or positions a clinical case in a way that leads a payer to deny appropriate care causes harm that is downstream and deniable. The agent did not make the clinical decision. But the agent shaped the argument that influenced it.

The pre-mortem’s central question, who owns the error, was always pointed at this trajectory. The agent is built by the health system, on Epic’s platform, using Curiosity’s foundation models, in a regulatory environment where no one has yet specified how liability is allocated between vendor and deployer. Advocate Health’s prototypes are the first step of a sequence that leads directly to that question.

Colorado Tried to Build the Rails

While health systems were building, legislators in Colorado were attempting to create the governance scaffolding that the platform lacks at a federal level. Three separate AI-related healthcare laws had been passed by June 2026, each addressing a different dimension of the problem, and each confirming the same underlying gap.

Colorado’s original AI Act, SB 24-205, was scrapped before it ever took effect. A legal challenge from X.AI in April 2026, supported by federal intervention from the DOJ, led to enforcement being suspended and the legislature repealing the law entirely. Its replacement, SB 26-189, was signed on 14 May. It is a narrower law, retaining consumer notice requirements and the right to meaningful human review following adverse outcomes, but dropping the duty-of-care standard and mandatory impact assessments that had made the original controversial. It takes effect January 1, 2027.

HB 26-1139, signed on 2 June, constrains how payers use AI in coverage determinations. It requires that AI-driven decisions be based on the patient’s individual medical and clinical history rather than group data, and that any denial or delay of coverage based on medical necessity receive review by a licensed clinician. It too takes effect January 1, 2027.

Together, SB 26-189 and HB 26-1139 create obligations on both sides of the prior authorisation workflow. Neither specifies who bears the cost when an agent-generated output leads to the wrong clinical outcome. Three laws confirming the gap exists is not the same as closing it.

The Sequence Is Not a Prediction. It Is a Pattern.

On 1 June 2026, eight days before the pre-mortem was published, the Joint Commission launched its first voluntary AI certification programme for healthcare organisations. Built on the initial guidance published with the Coalition for Health AI in September 2025, the certification covers governance, data management, risk and bias reduction, and monitoring. It is a meaningful step forward. But the certification recognises organisations, not individual tools. It does not validate or certify individual AI products. It contains no discussion of liability allocation. It is a framework for responsible intent, not a mechanism for accountability when something goes wrong.

Epic has not published a liability framework specifying what a health system owns when a self-built Agent Factory agent produces a clinical error. No Epic contract language or public terms of service document does so. No federal regulatory body has published guidance specifically addressing liability allocation for agentic AI operating within EHR environments. The FDA has authorised more than 1,400 AI-enabled devices and issued no specific enforcement guidance for agentic AI in EHR environments.

The pre-mortem’s conclusion was that if Epic published a clear liability framework and paired it with a safety review mechanism, Agent Factory could become the defining infrastructure layer of hospital AI over the next decade. That conclusion stands. What the evidence now confirms is that the clock is not running from some future launch date.

It was already running.

Pre-Mortem: Epic Agent Factory

Posted on June 9, 2026 by Andrew Scott

Update, 14 June 2026: One of America’s largest health systems was already building Agent Factory agents in late March, weeks before this piece published. This new piece confirms the central argument.

Epic unveiled Agent Factory at HIMSS 2026 (March 2026), positioning it as a no-code, drag-and-drop visual builder that lets health systems design, deploy, and monitor their own autonomous AI agents inside the Epic environment. Alongside it came Curiosity, a family of generative medical foundation models trained on deidentified records from 300 million patients across 310 health systems, backed by a research preprint on arXiv first published in August 2025. Together, the announcements represent Epic’s move from AI vendor to AI infrastructure provider, handing health systems the tools to build clinical automation at their own pace and on their own terms.

A pre-mortem is a discipline borrowed from project risk management. Before a programme succeeds or fails, you ask: if this does not go as planned, what was the mechanism? This series applies that lens to major AI-in-industry announcements, not to predict failure but to surface the questions that deserve answers before deployment, not after.

The Bet

Epic is betting that health systems want to own their AI destiny. Phil Lindemann, VP of Data and Research, framed Agent Factory as enabling customers to implement AI solutions without needing to call a vendor or write a line of code. That is a significant commercial and philosophical shift. Epic’s existing suite, Art, Penny, and Emmie, has posted credible numbers: 42 per cent reduction in prior authorisation submission time at Summit Health, 58 per cent sustained reduction in billing-related service messages at Rush University, 69 per cent early lung cancer detection at The Christ Hospital against a 46 per cent national average. The bet is that health systems, given those results as proof of concept, will want to build the next generation themselves.

The Assumption

The assumption underneath Agent Factory is that health system capability is ready to meet platform capability. Canvas Medical CEO Adam Farren noted in HIMSS 2026 commentary that most hospitals are not yet positioned to take advantage of the platform. Agent Factory is in early phase, with first availability in 2026 and continued rollout in 2027. Epic’s own roadmap, and the organisational readiness required for clinical agent deployment, put realistic momentum at leading health systems two to three years out. The platform may well be sound. The question is whether the organisations it serves have the clinical informatics depth, the governance infrastructure, and the project bandwidth to build and validate autonomous agents safely, particularly in clinical rather than administrative workflows.

The Sequence

Epic shipped the capability before any ratified standard governs what happens when a health-system-built agent makes a clinically significant error. The Joint Commission and Coalition for Health AI published voluntary joint guidance in September 2025, covering governance structures and vendor management. The FDA has authorised over 1,400 AI-enabled devices but has published no specific enforcement guidance for agentic AI in EHR environments. No federal regulatory framework yet specifies how liability for agent-generated clinical errors should be allocated between vendor and deploying health system. The capability is real and available. The governance architecture to surround it is not yet ratified.

The Pager

When an Agent Factory-built agent makes a clinically significant error, who owns it? Epic’s public framing places health systems “in the driver’s seat.” That is a positioning statement, not a governance document. No published contract language, terms of service excerpt, or named executive statement specifies who bears liability for agent-generated errors. No Epic accountability framework for self-built agents has been published. KPMG’s Q4 AI Pulse Survey (2025) found that 75 per cent of large-enterprise leaders name security, compliance, and auditability as their top requirements for agent deployment. At present, the answer to the pager question is that nobody has publicly claimed the call.

The Proof

Curiosity carries published research behind it: a preprint on arXiv first submitted in August 2025, covering 118 million patients and 151 billion tokens via the CoMET architecture. That is a meaningful evidential bar. Agent Factory has no equivalent published validation. Epic’s self-reported statistic that more than 85 per cent of customers are actively using Epic AI is plausible given market penetration of 43.7 per cent of US hospitals by count and 56.9 per cent by beds, but it refers to the existing suite, not to Agent Factory specifically. No performance benchmarks, error rate thresholds, or clinical outcome commitments for health-system-built agents on Agent Factory appear in any public source.

Verdict

If Epic publishes a clear liability framework that specifies what health systems own when they deploy self-built agents, and pairs that with a safety review mechanism before clinical agents go live, Agent Factory could become the defining infrastructure layer of hospital AI over the next decade. The foundation is genuinely strong: real outcome data from deployed agents, a clinically substantiated foundation model, and a market position that no competitor can easily replicate. The Curiosity publication demonstrates that Epic is capable of meeting an external evidential standard. The question is whether it applies that same rigour to the governance scaffolding around Agent Factory before health systems start building in earnest, rather than after the first serious incident forces the issue.

Pre-Mortem: The Pentagon’s Autonomous Drones Reset

Posted on June 2, 2026 by Andrew Scott

The Pentagon’s Replicator programme promised thousands of cheap autonomous drones in two years and delivered hundreds. The response has not been to wind it down. It has been to dissolve it, rebuild it as a new command inside Special Operations Command, and ask Congress for roughly 240 times the money. A programme that under-delivered on a lean, fast model is being re-attempted on a vast one, and the case for why the second structure succeeds where the first did not has not yet been made in public.

A pre-mortem asks the same five questions, every time, applied to a current programme before failure is possible rather than after. This is the third in the series. The first looked at vendor accountability in regulated finance. The second looked at clinical safety accountability in regulated healthcare. This one looks at execution accountability in defence procurement, the hardest delivery environment of them all. Different sector, similar structural shape: commitment moving faster than the architecture meant to hold it to account.

The Bet

The bet is that scale fixes what speed could not. Replicator was announced in August 2023 with a target of multiple thousands of all-domain attritable autonomous systems inside roughly two years, run by the Defense Innovation Unit on about a billion dollars across two fiscal years. It was deliberately lean, built to route around the traditional acquisition machine. By the deadline it had fielded hundreds. The reset, the Defense Autonomous Warfare Group, carries a 2027 budget request of about $54 billion, against roughly $226 million the year before. The technical bet is sound on its face: mass autonomy is where warfare is going, and the United States cannot afford to be slow to it. The harder bet, the one sitting under the headline number, is that money and a command structure fix what was an execution problem. Those are different things, and the launch treats them as one.

The Assumption

One belief is doing all the work: that Replicator’s shortfall was a problem of resourcing and structure, solvable with more of both. The documented failures point elsewhere. Systems were selected that proved unreliable, too expensive, or too slow to manufacture at the quantities needed. Some existed only as a concept when they were chosen. And the programme could not procure software able to orchestrate and command large, mixed swarms of different drones, which is the actual technical heart of autonomy at scale. None of those is a budget problem. A bigger budget buys more of the same systems and more of the same integration gap. If the diagnosis is wrong, the cure scales the disease.

The Sequence

Commitment came before the architecture, again. Replicator launched in August 2023. A second line of effort, focused on countering small drones, was added by a Secretary of Defense memo in September 2024. The original thousands-by-2025 deadline arrived with hundreds delivered. The programme was then consolidated into a joint interagency task force, dissolved, and rebuilt as the new autonomous-warfare group inside Special Operations Command, with the first acquisition under the new structure landing in January 2026, two counter-drone systems. Only in April 2026 did the Secretary tell the House Armed Services Committee that a sub-unified command for autonomous warfare was coming. The command meant to own this is still being stood up around a commitment already made. The funding tells the same story. Of that $54 billion, only about $1 billion is appropriated base money. The other $53 billion is a request, parked in a flexible five-year reconciliation pot that Congress has not yet passed. The headline number signals overwhelming commitment. In hard terms it is roughly a billion dollars in hand and fifty-three billion in hope. The intention is real. The money, for now, is one dollar in every fifty-four.

The Pager

Start with the credit, because it is real. The new group has a named director, Lt. Gen. Francis L. Donovan (USMC), with a clear command line and an appointment made by the Secretary himself. That is more named, senior accountability than most large defence programmes ever put on the public record, and it counts for something. The harder question is operational and specific. Standing policy requires appropriate levels of human judgement over the use of force. At swarm scale, with attritable systems acting at machine speed, who is the named individual accountable when one of them engages wrongly? The command line is clear. The accountability for the autonomous decision itself, at the scale this programme is built to reach, has not been framed in public. A command answers for a programme. It is a harder thing to say who answers for a single autonomous engagement when there are thousands of them in the air.

The Proof

The committed measures are input measures. Dollars requested, units contracted, the first systems bought. There is no public outcome measure for capability actually delivered, no cost per effective intercept, no fielded-and-working-at-scale figure with a date attached. This matters because the proof problem already bit once. Leadership called Replicator on track in 2024 and said it had made enormous strides in 2025, while the independent accounting found hundreds, not thousands. When the people who own the programme also own the definition of progress, optimism outruns delivery. Second-attempt scepticism is earned, not unfair. In eighteen months, the question of whether this worked will be answered by whoever holds the platform to define what delivered at scale means, and right now that platform is a budget request.

Verdict

This is a serious programme with serious people behind it. The strategic logic is correct, mass autonomy matters and slowness is its own risk. The accountability has a name and a rank, which is rare. The first systems have been bought and are heading to the field. None of that is in doubt.

What is unproven is whether a command and a budget can fix a problem that was about manufacturing maturity, software orchestration, and realistic system selection. A reorganisation addresses none of those by itself.

The action is concrete. Publish the outcome measure, not the input: a fielded-and-working-at-scale metric with a date, committed before the reconciliation money is spent, not after. Name the human accountable for autonomous engagement decisions at scale, not only the command that owns the programme. And diagnose the first shortfall in public before scaling, so the much larger second bet rests on a corrected understanding rather than a hope.

If the department publishes a delivered-at-scale outcome measure tied to a named owner, and solves the swarm-orchestration software problem it could not solve the first time, this becomes the programme that proves autonomous capability can be fielded at speed. Without both, it becomes the most expensive way yet found to relearn that money and reorganisation do not fix an execution problem.

Pre-Mortem: NHS Frontline Productivity Programme

Posted on May 26, 2026 by Andrew Scott

On 1 April 2026, NHS England formally launched the Frontline Productivity Programme. It succeeds the £2 billion Frontline Digitisation Programme and is anchored to the NHS 10-Year Health Plan. The headline target is a 2% year-on-year productivity gain over three years. The lead use case is Ambient Voice Technology (AVT), AI-powered ambient scribing for clinicians, with £200 million committed in year one. The Department of Health and Social Care (DHSC) and NHS England have appointed Rob Thompson as joint Chief Digital, Data and Technology Officer.

A pre-mortem asks the same five questions, every time, applied to a current programme. This is the second in the series. The first looked at vendor accountability in regulated finance. This one looks at clinical safety accountability in regulated healthcare. Different sector, similar structural shape.

The Bet

The NHS is betting that AVT can deliver enough of the 2% year-on-year productivity gain to justify scaling deployment to tens of thousands of clinicians faster than the clinical safety framework for AI-enabled ambient scribing can be ratified. The technical bet rides on multi-site evidence led by Great Ormond Street Hospital (GOSH) across nine London NHS sites and 17,000 patient encounters: a 23.5% increase in patient interaction time, an 8.2% reduction in appointment length, and a 13.4% increase in A&E patients per shift. The strategic bet is that 19 self-certified suppliers competing for trust contracts will produce price discipline without producing safety variance. Reasoned bets, made under genuine pressure, backed by measurable evidence. But they are bets, and the framing reads as inevitability.

The Assumption

One belief is doing all the work: that clinicians using AVT will verify AI-generated notes against the patient context every time, at scale, rather than develop the same review-as-rubber-stamp pattern automation has produced in every regulated environment it has reached. The mechanism that produces the productivity gain is the same mechanism that erodes clinical attention to the note. If review thins because AVT proves “good enough” most of the time, the productivity number stays positive while clinical safety quietly degrades. Patient Safety Learning argued earlier this year that Copilot has arrived in the NHS without the operational guidance clinicians need to use it safely.

The Sequence

Capability shipped before the operational governance for AI-enabled ambient scribing was ratified. South West London is rolling out AVT to 20,000 clinicians across four trusts. University Hospitals of Leicester and Northamptonshire have deployed to over 10,000. Hertfordshire Community NHS Trust has moved past pilot to full rollout. NHS England published a 19-supplier self-certified AVT registry in January. Underneath, the clinical safety standards DCB0129 and DCB0160 are under active review, and the Explainability-Enabled Clinical Safety Framework for AI is still being developed. Commitment came first. The assurance framework is catching up.

The Pager

The accountability layer on this programme is more developed than most national digital programmes ever achieve. Rob Thompson holds a joint DHSC/NHSE Chief Digital, Data and Technology Officer post: senior, named, public, accountable. Chief Clinical Information Officers (CCIOs) at every deploying trust carry statutory DCB0160 deployment accountability. That deserves credit. The harder question is operational. When an AVT-generated note contains a clinically significant error that affects patient care, who is the named individual who carries the pager that night? The trust CCIO? The supplier on the registry? The clinician who signed off the note? The accountability is statutory; the operational reporting line for AI-specific clinical safety failure has not yet been publicly framed for AVT.

The Proof

Three outcome measures sit in the public record: the 2% year-on-year productivity gain, the GOSH-led multi-site evaluation, and the Oxford University Hospitals pilot in which 90% of clinicians reported reduced documentation time. All three measure clinician time and patient throughput. None measure clinical safety. A 2025 national cross-sectional study in the Journal of Medical Internet Research (JMIR), covering 178 NHS organisations and 14,747 digital health technology deployments, found that only 17.3% were fully assured against both DCB0129 and DCB0160. At a typical NHS trust, only 24.5% of deployed technologies held both assurances. The standards exist. Compliance with them is patchy. There is no committed measure for AVT-attributable adverse event rate by supplier, the rate at which clinicians materially amend AI-generated notes versus accept them, or DCB0160 compliance inside the AVT registry specifically. In 18 months, “did this work?” will be answered by whoever owns the platform to define what safe enough means.

Verdict

The Frontline Productivity Programme is more carefully constructed than most NHS technology programmes of the past two decades. Named senior accountability, real pilot evidence, multiple trusts in genuine production deployment, a clear use case the workforce wants. None of that is in dispute.

What is in dispute is whether the underlying clinical safety assurance layer holds at scale. DCB0129 and DCB0160 exist. Compliance with them currently runs at a quarter of what it should be. The deployments are racing toward 20,000-clinician scale while the AI-specific framework is still being written.

The action is concrete. Name the human at each deploying trust who carries the pager when an AVT-generated note causes patient harm. Demand per-supplier clinical safety performance reports from each of the 19 registry vendors, not self-certifications. Publish a clinical safety outcome measure alongside the productivity target before the year is out: adverse event rate change attributable to AVT, broken out by trust and by supplier.

If NHS England publishes a clinical safety outcome measure tied to a named owner in six months, and the AVT registry shifts from self-certification to audited compliance, the Frontline Productivity Programme becomes a model for AI deployment in regulated public services. Without both, the productivity number stays positive while the question of whether it was worth the clinical safety risk remains structurally unanswerable.

Pre-Mortem: Anthropic’s Wall Street Agentic AI Suite

Posted on May 17, 2026 by Andrew Scott

Thirteen of the world’s largest financial institutions just deployed ten autonomous AI agents into the most regulated workflows in finance. None of them has publicly named who is accountable when the agents are wrong. Not the banks. Not the vendor. Not the regulators. The launch on 5 May reads like a milestone. Read closer and it reads like a stress test of every governance assumption the financial services industry operates on.

A post-mortem tells you why something failed once it already has. A pre-mortem asks the same questions before failure is possible. Same five questions, every time, applied to a current programme, announcement, or initiative. This is the first in the series, and the subject is not chosen by accident. The Anthropic Wall Street launch is the clearest example I have seen this year of capability racing ahead of the architecture meant to hold it to account. If you are a CIO, a CRO, or a transformation lead in a regulated industry, the lessons here apply to you whether you are deploying Claude or not.

The Bet

Anthropic and the deploying banks are betting that ten autonomous agents can land in the most regulated workflows in finance, underwriting, KYC, credit memos, statement audits, faster than the regulatory architecture can constrain them. The technical bet rides on Claude Opus 4.7’s 64.37% on the Vals AI Finance Agent benchmark and AIG’s quoted 88% accuracy on insurance claims out of the box. The strategic bet is that being first at this footprint, including JPMorgan Chase, Goldman Sachs, Citi, AIG, BNY, Carlyle, Mizuho, and Visa, outweighs whatever comes back from regulators in the next twelve months. Reasoned bets, made by an extraordinarily capable vendor and the most sophisticated buyers in the world. But they are bets, not certainties, and the launch reads as certainty. The CIO of any one of those banks is taking on operational, regulatory, and reputational risk for which the vendor has accepted no published share. That is the bet they should be examining most carefully.

The Assumption

One belief is doing all the work: that bank operating models can absorb ten simultaneously deployed agents without the human-in-the-loop quietly thinning where the agents prove reliable. Anthropic’s own commitment depends on it, from the primary announcement: “Users stay firmly in the loop, reviewing, iterating on, and approving Claude’s work before it goes to a client, gets filed, or is acted on.” The history of automation in regulated environments tells a different story. Algorithmic trading kill switches were not triggered because the system was performing. Automated underwriting reviews became rubber stamps once approval rates looked normal. Every automation failure in regulated finance follows the same arc: human oversight erodes invisibly as the system proves itself, and the erosion is only visible after the failure. JPMorgan CIO Lori Beer said it directly at the launch: “The technology can do so much. It’s the actual organization’s ability to digest and absorb it.” That ability is the load-bearing assumption. If it holds, the launch is a milestone. If it does not, the launch is a slow-moving incident.

The Sequence

Capability shipped. Ten named agents, Microsoft 365 generally available, Moody’s embedded, more than a dozen banks in production. What was committed before the operational governance for vendor-supplied agentic decisioning was published: all of it. Three weeks earlier, the Fed and the OCC revised Model Risk Management guidance and explicitly excluded agentic AI as “novel and rapidly evolving.” A Request for Information is planned, with no committed timeline. The EU AI Act’s high-risk financial-sector requirements take effect 2 August, twelve weeks after launch. The FCA and PRA decided against creating a dedicated AI Senior Management Function and instead mapped accountability onto existing SMFs that were never designed with autonomous agents in mind. Three jurisdictions. Three different gaps. One vendor launch landing in all of them at once. This is not a regulator being slow. This is a regulator explicitly stating that the rules do not yet apply, while the systems the rules are meant to govern are already in production.

The Pager

The banks have named regulatory accountability at the firm level. SMF24 (Chief Operations), SMF4 (Chief Risk Officer), SMF16 (Compliance Oversight) at FCA and PRA-regulated firms hold statutory responsibility for technology, risk, and compliance. Model risk owners at US firm level cover the same ground. Real, senior, public. That deserves credit. However, none of them have been publicly named for the deployment of these specific agents. Inheriting accountability through a job description is not the same as being named as the accountable owner of a programme. The first is the regulatory default. The second is what serious AI governance actually requires. Anthropic has no published vendor accountability commitment for autonomous regulated decisioning. The asymmetry is the entire story. When a Claude-built agent denies a loan that should have been approved, or approves a KYC file that should have been escalated, the pager rings at the bank, with consequences for the bank, while the vendor’s exposure is contractual and capped. The clearest demonstration came six days before the launch itself. On 29 April, Goldman Sachs removed Claude access for its Hong Kong bankers over contractual, regulatory, and geopolitical factors. The bank pulled the product. The vendor did not pull itself out. Whoever absorbs the cost when regulatory fit fails, absorbs it alone. Until vendor accountability is publicly framed, every bank deploying these agents is underwriting risk the vendor will not.

The Proof

Two outcome measures have been published. 64.37% on Vals AI. 88% on AIG insurance claims out of the box. Both are useful. Neither measures regulated-decision accuracy at scale. There is no committed measure for customer-detriment rate, near-miss frequency, incident reporting cadence to regulators, or the rate at which human reviewers actually amend agent outputs versus rubber-stamp them. The banks deploying these agents do not yet have public outcome commitments either, and that absence is its own answer. Former CFO Alyona Mysko captured what is at stake: “In finance, 99% correct is still wrong.” In eighteen months, the question “did this work?” will be answered by whoever owns the platform to define what work means. Right now, that platform is the vendor’s marketing. The banks need to claim that platform back, in their own outcome language, before the metric is set by a third party with no skin in their game.

Verdict

The launch is genuinely significant. More than a dozen named banks in production, industry-leading benchmark performance, audit logs in the Claude Console, the deepest Microsoft and Moody’s integrations any AI vendor has shipped. None of that is in dispute.

What is in dispute is whether the deploying banks have done the work to fill the accountability gap that the vendor has not closed and the regulators have not yet defined. The lesson generalises beyond Anthropic and beyond banking. Any CIO buying agentic AI in a regulated industry, healthcare, insurance, energy, the public sector, is operating in the same gap, and most have not yet noticed.

The action is concrete. Name the human in your organisation who carries the pager when the agent is wrong. Demand a vendor accountability schedule before you sign, not after. Define your own regulated-decision outcome measure and publish it, so the standard your performance is judged against is one you helped set.

If Anthropic publishes a vendor accountability commitment in the next six months, and a major bank commits to a public regulated-decision outcome measure tied to a named owner, this becomes a case study other industries will study for years. Without both, it becomes the most expensive procurement lesson the industry buys this decade.