Pre-Mortem: The Pentagon’s Autonomous Drones Reset

 

The Pentagon’s Replicator programme promised thousands of cheap autonomous drones in two years and delivered hundreds. The response has not been to wind it down. It has been to dissolve it, rebuild it as a new command inside Special Operations Command, and ask Congress for roughly 240 times the money. A programme that under-delivered on a lean, fast model is being re-attempted on a vast one, and the case for why the second structure succeeds where the first did not has not yet been made in public.

A pre-mortem asks the same five questions, every time, applied to a current programme before failure is possible rather than after. This is the third in the series. The first looked at vendor accountability in regulated finance. The second looked at clinical safety accountability in regulated healthcare. This one looks at execution accountability in defence procurement, the hardest delivery environment of them all. Different sector, similar structural shape: commitment moving faster than the architecture meant to hold it to account.

 

The Bet

The bet is that scale fixes what speed could not. Replicator was announced in August 2023 with a target of multiple thousands of all-domain attritable autonomous systems inside roughly two years, run by the Defense Innovation Unit on about a billion dollars across two fiscal years. It was deliberately lean, built to route around the traditional acquisition machine. By the deadline it had fielded hundreds. The reset, the Defense Autonomous Warfare Group, carries a 2027 budget request of about $54 billion, against roughly $226 million the year before. The technical bet is sound on its face: mass autonomy is where warfare is going, and the United States cannot afford to be slow to it. The harder bet, the one sitting under the headline number, is that money and a command structure fix what was an execution problem. Those are different things, and the launch treats them as one.

 

The Assumption

One belief is doing all the work: that Replicator’s shortfall was a problem of resourcing and structure, solvable with more of both. The documented failures point elsewhere. Systems were selected that proved unreliable, too expensive, or too slow to manufacture at the quantities needed. Some existed only as a concept when they were chosen. And the programme could not procure software able to orchestrate and command large, mixed swarms of different drones, which is the actual technical heart of autonomy at scale. None of those is a budget problem. A bigger budget buys more of the same systems and more of the same integration gap. If the diagnosis is wrong, the cure scales the disease.

 

The Sequence

Commitment came before the architecture, again. Replicator launched in August 2023. A second line of effort, focused on countering small drones, was added by a Secretary of Defense memo in September 2024. The original thousands-by-2025 deadline arrived with hundreds delivered. The programme was then consolidated into a joint interagency task force, dissolved, and rebuilt as the new autonomous-warfare group inside Special Operations Command, with the first acquisition under the new structure landing in January 2026, two counter-drone systems. Only in April 2026 did the Secretary tell the House Armed Services Committee that a sub-unified command for autonomous warfare was coming. The command meant to own this is still being stood up around a commitment already made. The funding tells the same story. Of that $54 billion, only about $1 billion is appropriated base money. The other $53 billion is a request, parked in a flexible five-year reconciliation pot that Congress has not yet passed. The headline number signals overwhelming commitment. In hard terms it is roughly a billion dollars in hand and fifty-three billion in hope. The intention is real. The money, for now, is one dollar in every fifty-four.

 

The Pager

Start with the credit, because it is real. The new group has a named director, Lt. Gen. Francis L. Donovan (USMC), with a clear command line and an appointment made by the Secretary himself. That is more named, senior accountability than most large defence programmes ever put on the public record, and it counts for something. The harder question is operational and specific. Standing policy requires appropriate levels of human judgement over the use of force. At swarm scale, with attritable systems acting at machine speed, who is the named individual accountable when one of them engages wrongly? The command line is clear. The accountability for the autonomous decision itself, at the scale this programme is built to reach, has not been framed in public. A command answers for a programme. It is a harder thing to say who answers for a single autonomous engagement when there are thousands of them in the air.

 

The Proof

The committed measures are input measures. Dollars requested, units contracted, the first systems bought. There is no public outcome measure for capability actually delivered, no cost per effective intercept, no fielded-and-working-at-scale figure with a date attached. This matters because the proof problem already bit once. Leadership called Replicator on track in 2024 and said it had made enormous strides in 2025, while the independent accounting found hundreds, not thousands. When the people who own the programme also own the definition of progress, optimism outruns delivery. Second-attempt scepticism is earned, not unfair. In eighteen months, the question of whether this worked will be answered by whoever holds the platform to define what delivered at scale means, and right now that platform is a budget request.

 

Verdict

This is a serious programme with serious people behind it. The strategic logic is correct, mass autonomy matters and slowness is its own risk. The accountability has a name and a rank, which is rare. The first systems have been bought and are heading to the field. None of that is in doubt.

What is unproven is whether a command and a budget can fix a problem that was about manufacturing maturity, software orchestration, and realistic system selection. A reorganisation addresses none of those by itself.

The action is concrete. Publish the outcome measure, not the input: a fielded-and-working-at-scale metric with a date, committed before the reconciliation money is spent, not after. Name the human accountable for autonomous engagement decisions at scale, not only the command that owns the programme. And diagnose the first shortfall in public before scaling, so the much larger second bet rests on a corrected understanding rather than a hope.

If the department publishes a delivered-at-scale outcome measure tied to a named owner, and solves the swarm-orchestration software problem it could not solve the first time, this becomes the programme that proves autonomous capability can be fielded at speed. Without both, it becomes the most expensive way yet found to relearn that money and reorganisation do not fix an execution problem.

Pre-Mortem: NHS Frontline Productivity Programme

 

On 1 April 2026, NHS England formally launched the Frontline Productivity Programme. It succeeds the £2 billion Frontline Digitisation Programme and is anchored to the NHS 10-Year Health Plan. The headline target is a 2% year-on-year productivity gain over three years. The lead use case is Ambient Voice Technology (AVT), AI-powered ambient scribing for clinicians, with £200 million committed in year one. The Department of Health and Social Care (DHSC) and NHS England have appointed Rob Thompson as joint Chief Digital, Data and Technology Officer.

A pre-mortem asks the same five questions, every time, applied to a current programme. This is the second in the series. The first looked at vendor accountability in regulated finance. This one looks at clinical safety accountability in regulated healthcare. Different sector, similar structural shape.

 

The Bet

The NHS is betting that AVT can deliver enough of the 2% year-on-year productivity gain to justify scaling deployment to tens of thousands of clinicians faster than the clinical safety framework for AI-enabled ambient scribing can be ratified. The technical bet rides on multi-site evidence led by Great Ormond Street Hospital (GOSH) across nine London NHS sites and 17,000 patient encounters: a 23.5% increase in patient interaction time, an 8.2% reduction in appointment length, and a 13.4% increase in A&E patients per shift. The strategic bet is that 19 self-certified suppliers competing for trust contracts will produce price discipline without producing safety variance. Reasoned bets, made under genuine pressure, backed by measurable evidence. But they are bets, and the framing reads as inevitability.

 

The Assumption

One belief is doing all the work: that clinicians using AVT will verify AI-generated notes against the patient context every time, at scale, rather than develop the same review-as-rubber-stamp pattern automation has produced in every regulated environment it has reached. The mechanism that produces the productivity gain is the same mechanism that erodes clinical attention to the note. If review thins because AVT proves “good enough” most of the time, the productivity number stays positive while clinical safety quietly degrades. Patient Safety Learning argued earlier this year that Copilot has arrived in the NHS without the operational guidance clinicians need to use it safely.

 

The Sequence

Capability shipped before the operational governance for AI-enabled ambient scribing was ratified. South West London is rolling out AVT to 20,000 clinicians across four trusts. University Hospitals of Leicester and Northamptonshire have deployed to over 10,000. Hertfordshire Community NHS Trust has moved past pilot to full rollout. NHS England published a 19-supplier self-certified AVT registry in January. Underneath, the clinical safety standards DCB0129 and DCB0160 are under active review, and the Explainability-Enabled Clinical Safety Framework for AI is still being developed. Commitment came first. The assurance framework is catching up.

 

The Pager

The accountability layer on this programme is more developed than most national digital programmes ever achieve. Rob Thompson holds a joint DHSC/NHSE Chief Digital, Data and Technology Officer post: senior, named, public, accountable. Chief Clinical Information Officers (CCIOs) at every deploying trust carry statutory DCB0160 deployment accountability. That deserves credit. The harder question is operational. When an AVT-generated note contains a clinically significant error that affects patient care, who is the named individual who carries the pager that night? The trust CCIO? The supplier on the registry? The clinician who signed off the note? The accountability is statutory; the operational reporting line for AI-specific clinical safety failure has not yet been publicly framed for AVT.

 

The Proof

Three outcome measures sit in the public record: the 2% year-on-year productivity gain, the GOSH-led multi-site evaluation, and the Oxford University Hospitals pilot in which 90% of clinicians reported reduced documentation time. All three measure clinician time and patient throughput. None measure clinical safety. A 2025 national cross-sectional study in the Journal of Medical Internet Research (JMIR), covering 178 NHS organisations and 14,747 digital health technology deployments, found that only 17.3% were fully assured against both DCB0129 and DCB0160. At a typical NHS trust, only 24.5% of deployed technologies held both assurances. The standards exist. Compliance with them is patchy. There is no committed measure for AVT-attributable adverse event rate by supplier, the rate at which clinicians materially amend AI-generated notes versus accept them, or DCB0160 compliance inside the AVT registry specifically. In 18 months, “did this work?” will be answered by whoever owns the platform to define what safe enough means.

 

Verdict

The Frontline Productivity Programme is more carefully constructed than most NHS technology programmes of the past two decades. Named senior accountability, real pilot evidence, multiple trusts in genuine production deployment, a clear use case the workforce wants. None of that is in dispute.

What is in dispute is whether the underlying clinical safety assurance layer holds at scale. DCB0129 and DCB0160 exist. Compliance with them currently runs at a quarter of what it should be. The deployments are racing toward 20,000-clinician scale while the AI-specific framework is still being written.

The action is concrete. Name the human at each deploying trust who carries the pager when an AVT-generated note causes patient harm. Demand per-supplier clinical safety performance reports from each of the 19 registry vendors, not self-certifications. Publish a clinical safety outcome measure alongside the productivity target before the year is out: adverse event rate change attributable to AVT, broken out by trust and by supplier.

If NHS England publishes a clinical safety outcome measure tied to a named owner in six months, and the AVT registry shifts from self-certification to audited compliance, the Frontline Productivity Programme becomes a model for AI deployment in regulated public services. Without both, the productivity number stays positive while the question of whether it was worth the clinical safety risk remains structurally unanswerable.

Pre-Mortem: Anthropic’s Wall Street Agentic AI Suite

 

Thirteen of the world’s largest financial institutions just deployed ten autonomous AI agents into the most regulated workflows in finance. None of them has publicly named who is accountable when the agents are wrong. Not the banks. Not the vendor. Not the regulators. The launch on 5 May reads like a milestone. Read closer and it reads like a stress test of every governance assumption the financial services industry operates on.

A post-mortem tells you why something failed once it already has. A pre-mortem asks the same questions before failure is possible. Same five questions, every time, applied to a current programme, announcement, or initiative. This is the first in the series, and the subject is not chosen by accident. The Anthropic Wall Street launch is the clearest example I have seen this year of capability racing ahead of the architecture meant to hold it to account. If you are a CIO, a CRO, or a transformation lead in a regulated industry, the lessons here apply to you whether you are deploying Claude or not.

 

The Bet

Anthropic and the deploying banks are betting that ten autonomous agents can land in the most regulated workflows in finance, underwriting, KYC, credit memos, statement audits, faster than the regulatory architecture can constrain them. The technical bet rides on Claude Opus 4.7’s 64.37% on the Vals AI Finance Agent benchmark and AIG’s quoted 88% accuracy on insurance claims out of the box. The strategic bet is that being first at this footprint, including JPMorgan Chase, Goldman Sachs, Citi, AIG, BNY, Carlyle, Mizuho, and Visa, outweighs whatever comes back from regulators in the next twelve months. Reasoned bets, made by an extraordinarily capable vendor and the most sophisticated buyers in the world. But they are bets, not certainties, and the launch reads as certainty. The CIO of any one of those banks is taking on operational, regulatory, and reputational risk for which the vendor has accepted no published share. That is the bet they should be examining most carefully.

 

The Assumption

One belief is doing all the work: that bank operating models can absorb ten simultaneously deployed agents without the human-in-the-loop quietly thinning where the agents prove reliable. Anthropic’s own commitment depends on it, from the primary announcement: “Users stay firmly in the loop, reviewing, iterating on, and approving Claude’s work before it goes to a client, gets filed, or is acted on.” The history of automation in regulated environments tells a different story. Algorithmic trading kill switches were not triggered because the system was performing. Automated underwriting reviews became rubber stamps once approval rates looked normal. Every automation failure in regulated finance follows the same arc: human oversight erodes invisibly as the system proves itself, and the erosion is only visible after the failure. JPMorgan CIO Lori Beer said it directly at the launch: “The technology can do so much. It’s the actual organization’s ability to digest and absorb it.” That ability is the load-bearing assumption. If it holds, the launch is a milestone. If it does not, the launch is a slow-moving incident.

 

The Sequence

Capability shipped. Ten named agents, Microsoft 365 generally available, Moody’s embedded, more than a dozen banks in production. What was committed before the operational governance for vendor-supplied agentic decisioning was published: all of it. Three weeks earlier, the Fed and the OCC revised Model Risk Management guidance and explicitly excluded agentic AI as “novel and rapidly evolving.” A Request for Information is planned, with no committed timeline. The EU AI Act’s high-risk financial-sector requirements take effect 2 August, twelve weeks after launch. The FCA and PRA decided against creating a dedicated AI Senior Management Function and instead mapped accountability onto existing SMFs that were never designed with autonomous agents in mind. Three jurisdictions. Three different gaps. One vendor launch landing in all of them at once. This is not a regulator being slow. This is a regulator explicitly stating that the rules do not yet apply, while the systems the rules are meant to govern are already in production.

 

The Pager

The banks have named regulatory accountability at the firm level. SMF24 (Chief Operations), SMF4 (Chief Risk Officer), SMF16 (Compliance Oversight) at FCA and PRA-regulated firms hold statutory responsibility for technology, risk, and compliance. Model risk owners at US firm level cover the same ground. Real, senior, public. That deserves credit. However, none of them have been publicly named for the deployment of these specific agents. Inheriting accountability through a job description is not the same as being named as the accountable owner of a programme. The first is the regulatory default. The second is what serious AI governance actually requires. Anthropic has no published vendor accountability commitment for autonomous regulated decisioning. The asymmetry is the entire story. When a Claude-built agent denies a loan that should have been approved, or approves a KYC file that should have been escalated, the pager rings at the bank, with consequences for the bank, while the vendor’s exposure is contractual and capped. The clearest demonstration came six days before the launch itself. On 29 April, Goldman Sachs removed Claude access for its Hong Kong bankers over contractual, regulatory, and geopolitical factors. The bank pulled the product. The vendor did not pull itself out. Whoever absorbs the cost when regulatory fit fails, absorbs it alone. Until vendor accountability is publicly framed, every bank deploying these agents is underwriting risk the vendor will not.

 

The Proof

Two outcome measures have been published. 64.37% on Vals AI. 88% on AIG insurance claims out of the box. Both are useful. Neither measures regulated-decision accuracy at scale. There is no committed measure for customer-detriment rate, near-miss frequency, incident reporting cadence to regulators, or the rate at which human reviewers actually amend agent outputs versus rubber-stamp them. The banks deploying these agents do not yet have public outcome commitments either, and that absence is its own answer. Former CFO Alyona Mysko captured what is at stake: “In finance, 99% correct is still wrong.” In eighteen months, the question “did this work?” will be answered by whoever owns the platform to define what work means. Right now, that platform is the vendor’s marketing. The banks need to claim that platform back, in their own outcome language, before the metric is set by a third party with no skin in their game.

 

Verdict

The launch is genuinely significant. More than a dozen named banks in production, industry-leading benchmark performance, audit logs in the Claude Console, the deepest Microsoft and Moody’s integrations any AI vendor has shipped. None of that is in dispute.

What is in dispute is whether the deploying banks have done the work to fill the accountability gap that the vendor has not closed and the regulators have not yet defined. The lesson generalises beyond Anthropic and beyond banking. Any CIO buying agentic AI in a regulated industry, healthcare, insurance, energy, the public sector, is operating in the same gap, and most have not yet noticed.

The action is concrete. Name the human in your organisation who carries the pager when the agent is wrong. Demand a vendor accountability schedule before you sign, not after. Define your own regulated-decision outcome measure and publish it, so the standard your performance is judged against is one you helped set.

If Anthropic publishes a vendor accountability commitment in the next six months, and a major bank commits to a public regulated-decision outcome measure tied to a named owner, this becomes a case study other industries will study for years. Without both, it becomes the most expensive procurement lesson the industry buys this decade.