Governance Is Not a Committee. It Is a Decision Architecture

A technology programme was delivered on time. The steering committee signed it off. The system went live on schedule and within budget. Twelve months later, usage across the organisation sat at eleven percent. The project had been a success by every measure the governance structure tracked. It had failed by the only measure that mattered.

Nobody was accountable for the eleven percent. The named owner had moved to a different role. The steering committee was dissolved at go-live. The vendor had fulfilled its contract. The organisation had built something that worked perfectly and was used by almost nobody, and no single person in the building could explain why.

That is not a delivery failure. It is a governance failure. And it is far more common than any organisation publicly admits.

 

What Governance Actually Is

Governance is one of those words that everyone uses and nobody defines. In most organisations, it has come to mean a structure: a committee, a framework document, an approval process, a risk register. Something you have rather than something you do. You have a governance framework. The governance is in place. The committee meets quarterly.

This version of governance is useless.

Governance is not a structure. It is a decision architecture. It is the infrastructure that determines how decisions are made, who makes them, what they are accountable for, and how fast the organisation can act when circumstances change.

Every organisation has a governance architecture, whether it has designed one or not. The informal version is still a governance architecture: decisions made by whoever is most senior in the room, accountability absorbed by whoever is most junior when something goes wrong, escalation triggered whenever someone is uncomfortable. It is simply a poor one. The difference between organisations that move well and organisations that stall is rarely capability. It is usually the quality of the decision infrastructure underneath the capability.

 

Governance Theatre

The most dangerous governance is the kind that looks correct from the outside.

Most large organisations have built governance that performs the appearance of oversight without the function. The risk register is meticulously maintained and never acted upon. The steering committee meets monthly and has not once paused a programme. The policy required six weeks of approval and is read by nobody after signing. The assurance review always concludes the project is on track.

This is more harmful than no governance, for one reason: it generates confidence without protection. The board believes the oversight is in place. The programme team believes the risks are managed. The organisation proceeds as if the architecture exists, while operating without it. When the failure arrives, it arrives at scale, having been invisible to every structure designed to catch it.

The question is not whether your organisation has governance. The question is whether your governance is real.

 

What Good Governance Looks Like

Good governance has five characteristics that distinguish it from the committee-and-checkpoint version most organisations have built.

The first is named ownership. Every material decision, every significant deployment, every consequential process has a single individual accountable for the outcome. Not a committee. Not a function. A person. The committee can advise. The function can review. One name sits against each thing that matters, and that person knows it and accepts it.

The second is authority that matches accountability. The most common governance failure is asking someone to be accountable for an outcome they cannot influence. If the named owner cannot pause a deployment, redirect a budget, or override a recommendation, their accountability is nominal. If you cannot identify what the accountable person can stop, you have not given them accountability. You have given them exposure.

The third is pre-agreed frameworks. Good governance does not require full escalation for every decision. It requires that boundaries are agreed in advance, so decisions within those boundaries can be made quickly, and decisions outside them trigger a defined path. The approval gate model creates queues. The framework model reserves escalation for the decisions that genuinely need it. Speed and governance are not a trade-off. They are a design choice.

The fourth is transparency of reasoning. Material decisions need a record. Not for audit purposes, but because the organisations that navigate change well are the ones where future leaders can understand not just what was decided, but why, what alternatives were considered, and what conditions would prompt a different outcome. This is not bureaucracy. It is institutional memory, and its absence is one of the most expensive losses any organisation experiences.

The fifth is a culture that supports use. The best governance architecture fails if the organisation punishes the people who use it correctly. The programme manager who escalates a risk that delays a milestone. The engineer who flags a model limitation that complicates a launch. The analyst who says the data is not fit for purpose. If those people are sidelined or not listened to, the framework is decorative. Governance is architecture and behaviour. Building the architecture without addressing the behaviour is half the work.

 

Governance Debt

There is a cost to governance failure that does not appear on any balance sheet until it is too late to address cheaply.

Every decision made without proper governance accumulates what might be called governance debt. The decision is made, the programme moves forward, the system is deployed. The cost is not visible immediately. It appears two years later, when the person who made the original choice has moved on, when nobody can explain why the architecture was designed the way it was, when the organisation needs to change a system it no longer fully understands and cannot safely modify.

Like financial debt, governance debt compounds. Small omissions early in a programme create disproportionately large costs at the point of change. The organisations that experience the most expensive transformations are rarely those that started with the hardest problems. They are those that accumulated governance debt in the early stages and discovered the interest charge when conditions changed.

 

The Speed Paradox

The dominant assumption about governance is that it slows things down. The evidence says otherwise.

Financial services is among the most heavily governed sectors in the world. It is also, by measurable data, among the fastest at moving AI from experimentation to production. Databricks’ analysis of enterprise AI adoption found that financial services improved its experimental-to-production ratio from 29:1 to 10:1 in under eighteen months, the sharpest improvement of any sector measured. The governance culture that financial services built under regulatory compulsion became, in practice, a deployment accelerant.

The reason is straightforward. When governance is architecture rather than checkpoint, when boundaries are pre-agreed and ownership is named, decisions within the framework do not require escalation. The work that in a poorly governed organisation requires a committee review happens at team level, within agreed parameters, without delay. The governance does not add a stage to the process. It is the process.

The organisations that move slowly under governance are the ones with checkpoints. The ones that move fast under governance are the ones with architecture.

 

Why AI Makes This Urgent

AI does not create governance problems. It amplifies the ones that already exist.

Every organisation deploying AI is making decisions at scale and at speed in ways that are not always visible to the people accountable for outcomes. When a model influences hiring, lending, clinical treatment, or procurement, the decision architecture governing that model matters as much as the architecture governing any senior leader. In some respects more.

Three risks are specific to AI. The first is accountability diffusion. When a decision is made by a model, who is accountable is rarely defined in practice. The model carries no accountability. The vendor carries it within narrow contractual limits. The organisation must deliberately assign it or it defaults to nobody, which is where most organisations currently sit.

The second is scale of error. A human decision-maker with a blind spot makes that error incrementally. A model with the same blind spot can make it thousands of times before the pattern is identified. The governance that catches a human error at ten instances must catch a model error at ten thousand. Most governance frameworks were not designed for that volume.

The third is the deployment and use gap. AI systems are deployed for a defined purpose in a defined context. They are then used in contexts their designers did not anticipate, by people not trained on their limitations, for decisions the governance framework never considered. Governance must follow the system into use, not stop at the deployment gate.

One additional risk is specific to the current moment. In most organisations, AI governance covers the official deployments. It has no visibility of, and no authority over, the AI already in use through personal accounts, consumer tools, and unapproved models. The governance gap that will produce the first visible failures is not in the formal AI programme. It is in the tools already running beneath the governance architecture’s line of sight.

For boards, this is a specific accountability question. Most are receiving AI updates without the frameworks to evaluate them. The question is not whether the organisation has an AI strategy. It is whether the board can answer four things: who is accountable for each material AI deployment, what authority they hold, what the escalation path looks like when something goes wrong, and whether the governance covers the AI that is actually in use rather than only the AI that was formally approved.

 

Three Questions That Will Tell You More Than Any Framework Audit

Name the person accountable for your most significant AI deployment. Not the team. Not the function. One person. If you cannot name them in under ten seconds, you do not have governance. You have the appearance of it.

When did your governance last stop something? Not delay it, not document a risk against it. Stop it. If the answer is never, your governance is not functioning as risk infrastructure. It is functioning as a record-keeping exercise.

If the three people who made your most significant programme decisions in the last two years left tomorrow, what would the organisation know about why those decisions were made? If the answer is not much, you are accumulating governance debt at a rate your future leaders will pay.

Governance is not a committee. It is not a document. It is the infrastructure through which an organisation makes consequential decisions, learns from them, and remains able to change course when it needs to.

Most organisations have not built that infrastructure. AI has not created that problem. It has simply made the cost of not solving it impossible to ignore.

Pre-Mortem: KPMG’s AI-Powered Audit

The audit opinion is the most consequential document most public companies produce. Not the annual report. Not the investor deck. The audit opinion, because it carries a named partner’s signature, and because that signature means something in law. On 9 June 2026, KPMG and Microsoft announced the deployment of Microsoft Agent 365 and Copilot across 276,000 KPMG professionals in 138 countries, including inside KPMG Clara, the firm’s global smart audit platform. Scott Flynn, KPMG’s Global Head of Audit, called it “a pivotal milestone in our AI-powered, human assured audit transformation.” The word “assured” is doing a great deal of work in that sentence.

A pre-mortem asks the same five questions, every time, applied before failure is possible rather than after. This is the fifth in the series. The first looked at vendor accountability in regulated finance. The second at clinical safety in healthcare. The third at execution accountability in defence procurement. The fourth at clinical AI infrastructure. This one looks at professional services, the sector that has built its entire business model on the premise that human expertise is the product.

 

The Bet

KPMG is betting that efficiency and accountability can coexist at this scale. That 276,000 professionals deploying AI agents, with a governance layer running underneath, will not dilute the professional accountability the audit opinion rests on. It is a reasonable bet. It is also an untested one. The commercial logic is clear: 276,000 professionals, 138 countries, and an AI-powered workflow running through KPMG Clara creates the kind of structural productivity gain that redefines the firm’s cost base, and potentially its fee model. Analysis of recent audit fee movements suggests clients are already pressing the case that AI efficiency should flow through to lower fees. The deeper bet, the one sitting beneath the headline deployment, is that “AI-powered, human-assured” constitutes a defensible operating model before any regulatory body has defined what “human-assured” actually requires in practice.

 

The Assumption

The single assumption carrying all the weight: that governing agents is the same thing as being accountable for them. Microsoft Agent 365 provides what its own documentation describes as a control plane, a centralised registry of agents with lifecycle rules, identity controls, and audit logging. That is a meaningful capability. It answers the question: how many agents do you have, and what can they touch? It does not, on its own, answer the question a claims lawyer or a regulator will eventually ask: who is accountable when the agent was visible, governed, and still wrong? KPMG’s Trusted AI framework lists ten ethical pillars, including one labelled Accountability, which calls for human oversight and responsibility to be embedded across the AI lifecycle. That is a principle-level commitment. None of the publicly available documentation specifies what happens to the partner’s signature when an AI-assisted conclusion is signed off and later found to be materially incorrect.

 

The Sequence

KPMG has deployed agents at scale before any authoritative regulatory framework specifies what AI-assisted audit evidence must look like, or how human review of AI-generated conclusions must be documented to meet existing standards. The IAASB approved a project proposal in March 2026 to revise ISA 500, Audit Evidence, to address technology use in audit, but the project is still in early research and information gathering, with no exposure draft issued and no effective date. The PCAOB has stated publicly that it is considering developing risk management guidance for audit firms using AI. Considering, not publishing. The capability is deployed. The standard that surrounds it is still being drafted.

 

The Pager

Lisa Heneghan, KPMG’s Global Chief Digital Officer, was specific about what this deployment requires: “strong foundations in governance, visibility and accountability.” That framing is responsible, and Agent 365 provides the visibility that most enterprises currently lack. The harder question is structural and specific. The audit opinion is signed by a named partner. Professional indemnity is priced around that signature. When an agent embedded in KPMG Clara surfaces a conclusion, the partner reviews it, signs the opinion, and the work later contains a material error, the liability has historically sat with the partner and the firm. What KPMG, Microsoft, and the client have not yet published is a clear allocation of responsibility for the agent’s contribution to that error. Is it a tool failure, an oversight failure, or something existing frameworks do not yet classify? The governance layer provides the audit trail. It does not specify who reads it, or what reading it is worth, when a claim is filed.

 

The Proof

The announcement commits 276,000 professionals and earns KPMG the designation of Microsoft “Frontier Firm.” Neither is a performance measure. No published metric connects this deployment to audit accuracy improvement, reduction in deficiencies, or quality outcomes. What the deployment actually demonstrates is that KPMG can deploy Agent 365 at scale and maintain visibility over its agent estate. That is a meaningful operational achievement. It is not the same as demonstrating that AI-assisted audit conclusions are more reliable than human-only ones, which is what regulators, courts, and insurers will eventually need to see. KPMG Clara’s existing framing covers adoption and workflow integration. No published figure connects it to audit opinion accuracy or deficiency rates. The proof that matters most is still outstanding.

 

Verdict

If KPMG publishes a clear framework specifying how AI-assisted audit evidence is reviewed, validated, and documented, paired with a liability position that survives regulatory scrutiny, this becomes the reference model for professional services AI at scale. The governance commitment is genuine. The scale of deployment is unmatched in the sector. Scott Flynn’s “AI-powered, human-assured” is the right aspiration. The question is whether “human-assured” describes a documented, auditable review process that a regulator will accept and an insurer will cover, or whether it is a positioning statement waiting for a definition. At 276,000 professionals across 138 countries, the audit opinion at the centre of this deployment is too consequential to leave that question open. The answer should come before the first material claim, not after.

Why Programmes Fail in the First 30 Days, Before Anyone Admits It

By the time a programme is declared in trouble, the failure is usually months old.

The governance review that triggers the intervention, the escalation that finally reaches the executive team, the moment someone says out loud what everyone has privately known for weeks. None of that is when the failure started. It is when the failure became undeniable.

The real decisions that determined the outcome were made in the first thirty days. In rooms that were not minuted. In conversations that were not followed up. In the silence where challenge should have been.

I have stepped into enough programmes to know this pattern. And the reality is that by the time you are called in to fix something, you are not dealing with a delivery problem. You are dealing with the compounded consequences of a foundation that was never properly laid.

Bain & Company’s 2024 survey of more than 400 executives found that 88% of business transformations fail to achieve their original ambitions. Most of those failures were not caused by what happened in month six. They were caused by what was decided, or not decided, in month one.

 

The Thirty-Day Window Nobody Takes Seriously Enough

Every programme has a formation period. A window, roughly the first month, where the critical decisions that will shape everything downstream are being made, often informally, often without the weight they deserve.

This is when scope is being interpreted, not just defined. When the people who will actually do the work are forming their first impressions of the leadership, the culture, and whether honesty will be safe here. When the relationships between workstreams are either being built deliberately or left to chance. When assumptions are being made that nobody has written down because everyone assumes everyone else shares them.

Most organisations treat this period as setup. As administration. As the unglamorous precursor to the real work.

It is the real work. Everything that follows is either built on what was established here or fighting against what was not.

The thirty-day window is where programmes are won or lost. We just do not find out until much later.

 

The Scope That Nobody Challenged

Here is where it starts, almost every time.

The scope arrives with the programme. It comes from somewhere, a business case, a procurement process, a senior stakeholder’s vision, a consultancy’s recommendation. It has been approved. It has a budget attached to it. It has a go-live date.

And in the first thirty days, the people now responsible for delivering it read it, sense the problems, and say nothing.

Not because they are incompetent. Because the environment has not established that challenge is welcome. Because the approval process gives scope a kind of authority that makes questioning it feel like insubordination. Because there is pressure, spoken or unspoken, to project confidence rather than raise doubt.

So the assumptions embedded in the scope go unexamined. The dependencies that are not owned by anyone get noted and moved past. The timeline that was built on optimism rather than evidence gets accepted as a constraint rather than interrogated as a risk.

PMI’s 2025 Project Success research found that a clear vision of success at the outset gives projects a Net Project Success Score of +41. The absence of that clarity produces a score of -18. A 59-point swing, determined before the plan is even baselined.

And the programme sets off carrying weight it was never designed to carry. The team knows it. The experienced ones, anyway. But the conversation that would surface it has not happened. So the weight gets managed quietly, worked around, absorbed, until the point when it cannot be anymore.

That point arrives later, visibly, dramatically, in a way that looks sudden.

It was not sudden. It was decided in week two when nobody pushed back on the plan.

 

The Relationships That Were Never Built

Programmes are delivered by people who depend on each other across workstreams, across organisations, across cultural and institutional boundaries that no project plan captures.

Those dependencies only work if the relationships underneath them work. And relationships, real ones, the kind where someone will tell you the truth at 6pm on a Thursday when the news is bad, are not built in kick-off presentations and introductory calls.

They are built in the unglamorous, unscheduled moments of the first thirty days. The informal conversations. The one-on-ones that were not on the plan. The deliberate investment in understanding who the key people are, what they actually care about, what they are worried about, and what they need from you to show up fully.

Most programmes do not make this investment. Leaders are too focused on getting the governance structures right, the plans baselined, the first steering pack prepared. The relational architecture gets left to develop on its own.

It does not develop on its own. It either gets built or it does not. And when it does not, you find out in month four when a critical dependency stalls because two workstream leads have never actually talked, when a key stakeholder disengages because nobody made them feel like a genuine part of the programme, when the supplier relationship that looked functional on paper turns out to have no real trust underneath it.

The fix at that point takes weeks. The investment in week one would have taken an afternoon.

 

The Conversations Nobody Documented

This one is quieter. Harder to see. But just as damaging.

In the first thirty days of any programme, hundreds of micro-decisions get made in conversations that never make it into the formal record. Someone interprets a requirement and moves on. Two people informally agree on a boundary between workstreams that later becomes a gap nobody owns. A risk gets raised in a corridor and managed privately rather than surfaced. An assumption gets made about what the business actually wants that nobody validates because everyone is too busy moving.

These conversations create the real operating model of the programme. Not the governance framework. Not the RACI. The informal, undocumented, human architecture of how this programme will actually function.

When that architecture is sound, when the right conversations happened and the right things got clarified, programmes have a resilience that is hard to explain on paper. They absorb setbacks. They surface problems early. They self-correct.

When it is not sound, the gaps compound. Every week, the distance between the documented reality and the lived reality grows. The risk register reflects what people were willing to write down, not what is actually keeping them up at night. The plan reflects what was agreed in the room, not what the people closest to the work know is actually achievable.

And somewhere around month three or four, the gap becomes too large to manage quietly.

 

The Culture That Set Before Anyone Noticed

The most underestimated consequence of the first thirty days is cultural.

Within a month, every person on a programme has formed a working theory of how this environment operates. Is honesty safe here? Does leadership want the truth or does it want reassurance? What happens to the person who raises a problem? Do they get support or do they get blame? Is this a place where people cover for each other or compete with each other?

These conclusions get drawn from small evidence. The way the programme director responded to the first piece of bad news. Whether the first difficult conversation was handled with directness or avoided. Whether the team lead who flagged a risk was thanked for it or made to feel like they were creating problems.

People are extraordinarily good at reading these signals. They adapt fast. And once the culture has set, once the team has learned what is rewarded and what is penalised, changing it is one of the hardest things in programme leadership.

Research by Milliken, Morrison and Hewlin, published in the Journal of Management Studies, found that 85% of employees had felt unable to raise an important issue or concern with their boss, even when they believed it mattered. That figure will not surprise anyone who has led a programme in distress. The information existed. The team knew. Nobody said it.

I have been in programmes where the psychological safety was so low by month two that meaningful escalation had effectively stopped. Not because the problems had stopped. Because the team had learned that surfacing problems did not help them. That information would travel upward selectively, defensively, shaped to protect the messenger rather than inform the leader.

That culture was established in the first thirty days. Nobody designed it. Nobody intended it. But every small signal, every early interaction, every moment where tone was set rather than thought about, built it brick by brick.

And it was almost impossible to dismantle in month five.

 

What the First Thirty Days Actually Requires

It requires a leader who understands that the work of the first month is not administrative. It is foundational.

It requires the courage to challenge scope before the plan is baselined, even when the pressure is to move quickly. Because the conversation you avoid in week one becomes the crisis you manage in month six.

It requires the deliberate investment in relationships that will not show any return for weeks. The conversations that feel like a luxury when the governance structure needs building and the steering pack is due. They are not a luxury. They are the infrastructure.

It requires the explicit establishment of culture, not through a values statement or a team charter, but through behaviour. Through how you respond to the first piece of bad news. Through whether you ask for honesty or perform as though you want it while rewarding those who tell you what you want to hear.

It requires the discipline to document the undocumentable. To make explicit the assumptions, interpretations, and informal agreements that will otherwise compound silently until they cannot be managed.

And it requires humility. The humility to know that what you do not understand about this organisation, this culture, and these people in the first thirty days will cost you more than anything on the risk register.

 

The Post-Mortem Nobody Gets Right

Most post-mortems on failed programmes look at the wrong timeline.

They analyse month seven, when the slippage became undeniable. Month five, when the critical path was already broken. Month four, when the relationships between key workstreams had deteriorated beyond functional.

The real analysis belongs in month one. In the decisions that were made without enough information. The challenges that were not raised. The relationships that were not prioritised. The culture that was allowed to form without intent.

By the time a programme looks like it is failing, it has been failing for a long time.

The window where it could have been different closed thirty days in.

Most organisations do not realise that. So they keep investing in better governance frameworks, more sophisticated reporting tools, and more rigorous steering processes, applied at the stage of the programme where the outcome is already largely determined.

The intervention that would actually change the failure rate happens at the beginning. In the unglamorous, under-valued, insufficiently serious first thirty days.

That is where programmes are won.

That is where most of them are lost.

Already Building: Epic Agent Factory and the Governance Gap

The pre-mortem on Epic Agent Factory asked who would answer when a health-system-built agent made a clinically significant error. It published on 9 June. I have since learned of a Becker’s Hospital Review report from 30 March confirming that one of America’s largest health systems had already been building those agents for weeks before the question was published.

It confirms the pre-mortem’s central argument. Neither the research nor the article surfaced how quickly the sequence had already begun.

 

The Deployment That Was Already In Motion

Advocate Health had already tapped Epic’s Agent Factory, becoming one of the first health systems to build and deploy agents through the platform. Andy Crowder, Advocate Health’s SVP and Chief Digital and AI Officer, described the direction in a LinkedIn post on 26 March: “By combining Epic’s Agent Factory Platform capabilities with Advocate Health’s scale, clinical insight, and commitment to innovation, we’re translating AI from promise into practice.” He pointed to a three-day Epic immersion at The Pearl innovation district in Charlotte, focused on speeding up pharmacy verification for complex medications and cutting infusion chart preparation time for pharmacists and nurses. Four working prototypes emerged, scheduled to go live in July 2026.

Crowder added: “Together, we’re advancing responsible, practical AI that fits naturally into clinical workflows, reduces friction, and gives clinicians back time to focus on what matters most.” It is a considered statement, and the commitment is genuine. But it is not a governance document. And Advocate Health is not unusual here. They are representative. They moved first because the platform enabled it, the commercial pressure to reduce administrative burden was real, and nothing in the regulatory landscape said stop.

This is the sequence the pre-mortem described. Capability arrived. Deployment followed. The governance architecture to surround it had not been ratified.

 

The Workflows That Come Next

Pharmacy verification and infusion chart preparation are not, in themselves, clinical decision-making. They reduce documentation burden and carry genuine operational value. But they are the entry point, not the ceiling.

Epic’s own Penny agent already handles prior authorisation for thousands of health systems. Agent Factory is the platform through which health systems build their own versions of exactly those capabilities. Prior authorisation sits at the intersection of clinical judgment and payer approval. An AI-generated argument that misrepresents a contraindication, omits a relevant diagnosis, or positions a clinical case in a way that leads a payer to deny appropriate care causes harm that is downstream and deniable. The agent did not make the clinical decision. But the agent shaped the argument that influenced it.

The pre-mortem’s central question, who owns the error, was always pointed at this trajectory. The agent is built by the health system, on Epic’s platform, using Curiosity’s foundation models, in a regulatory environment where no one has yet specified how liability is allocated between vendor and deployer. Advocate Health’s prototypes are the first step of a sequence that leads directly to that question.

 

Colorado Tried to Build the Rails

While health systems were building, legislators in Colorado were attempting to create the governance scaffolding that the platform lacks at a federal level. Three separate AI-related healthcare laws had been passed by June 2026, each addressing a different dimension of the problem, and each confirming the same underlying gap.

Colorado’s original AI Act, SB 24-205, was scrapped before it ever took effect. A legal challenge from X.AI in April 2026, supported by federal intervention from the DOJ, led to enforcement being suspended and the legislature repealing the law entirely. Its replacement, SB 26-189, was signed on 14 May. It is a narrower law, retaining consumer notice requirements and the right to meaningful human review following adverse outcomes, but dropping the duty-of-care standard and mandatory impact assessments that had made the original controversial. It takes effect January 1, 2027.

HB 26-1139, signed on 2 June, constrains how payers use AI in coverage determinations. It requires that AI-driven decisions be based on the patient’s individual medical and clinical history rather than group data, and that any denial or delay of coverage based on medical necessity receive review by a licensed clinician. It too takes effect January 1, 2027.

Together, SB 26-189 and HB 26-1139 create obligations on both sides of the prior authorisation workflow. Neither specifies who bears the cost when an agent-generated output leads to the wrong clinical outcome. Three laws confirming the gap exists is not the same as closing it.

 

The Sequence Is Not a Prediction. It Is a Pattern.

On 1 June 2026, eight days before the pre-mortem was published, the Joint Commission launched its first voluntary AI certification programme for healthcare organisations. Built on the initial guidance published with the Coalition for Health AI in September 2025, the certification covers governance, data management, risk and bias reduction, and monitoring. It is a meaningful step forward. But the certification recognises organisations, not individual tools. It does not validate or certify individual AI products. It contains no discussion of liability allocation. It is a framework for responsible intent, not a mechanism for accountability when something goes wrong.

Epic has not published a liability framework specifying what a health system owns when a self-built Agent Factory agent produces a clinical error. No Epic contract language or public terms of service document does so. No federal regulatory body has published guidance specifically addressing liability allocation for agentic AI operating within EHR environments. The FDA has authorised more than 1,400 AI-enabled devices and issued no specific enforcement guidance for agentic AI in EHR environments.

The pre-mortem’s conclusion was that if Epic published a clear liability framework and paired it with a safety review mechanism, Agent Factory could become the defining infrastructure layer of hospital AI over the next decade. That conclusion stands. What the evidence now confirms is that the clock is not running from some future launch date.

It was already running.

Pre-Mortem: Epic Agent Factory

Update, 14 June 2026: One of America’s largest health systems was already building Agent Factory agents in late March, weeks before this piece published. This new piece confirms the central argument.


 

Epic unveiled Agent Factory at HIMSS 2026 (March 2026), positioning it as a no-code, drag-and-drop visual builder that lets health systems design, deploy, and monitor their own autonomous AI agents inside the Epic environment. Alongside it came Curiosity, a family of generative medical foundation models trained on deidentified records from 300 million patients across 310 health systems, backed by a research preprint on arXiv first published in August 2025. Together, the announcements represent Epic’s move from AI vendor to AI infrastructure provider, handing health systems the tools to build clinical automation at their own pace and on their own terms.

A pre-mortem is a discipline borrowed from project risk management. Before a programme succeeds or fails, you ask: if this does not go as planned, what was the mechanism? This series applies that lens to major AI-in-industry announcements, not to predict failure but to surface the questions that deserve answers before deployment, not after.

 

The Bet

Epic is betting that health systems want to own their AI destiny. Phil Lindemann, VP of Data and Research, framed Agent Factory as enabling customers to implement AI solutions without needing to call a vendor or write a line of code. That is a significant commercial and philosophical shift. Epic’s existing suite, Art, Penny, and Emmie, has posted credible numbers: 42 per cent reduction in prior authorisation submission time at Summit Health, 58 per cent sustained reduction in billing-related service messages at Rush University, 69 per cent early lung cancer detection at The Christ Hospital against a 46 per cent national average. The bet is that health systems, given those results as proof of concept, will want to build the next generation themselves.

 

The Assumption

The assumption underneath Agent Factory is that health system capability is ready to meet platform capability. Canvas Medical CEO Adam Farren noted in HIMSS 2026 commentary that most hospitals are not yet positioned to take advantage of the platform. Agent Factory is in early phase, with first availability in 2026 and continued rollout in 2027. Epic’s own roadmap, and the organisational readiness required for clinical agent deployment, put realistic momentum at leading health systems two to three years out. The platform may well be sound. The question is whether the organisations it serves have the clinical informatics depth, the governance infrastructure, and the project bandwidth to build and validate autonomous agents safely, particularly in clinical rather than administrative workflows.

 

The Sequence

Epic shipped the capability before any ratified standard governs what happens when a health-system-built agent makes a clinically significant error. The Joint Commission and Coalition for Health AI published voluntary joint guidance in September 2025, covering governance structures and vendor management. The FDA has authorised over 1,400 AI-enabled devices but has published no specific enforcement guidance for agentic AI in EHR environments. No federal regulatory framework yet specifies how liability for agent-generated clinical errors should be allocated between vendor and deploying health system. The capability is real and available. The governance architecture to surround it is not yet ratified.

 

The Pager

When an Agent Factory-built agent makes a clinically significant error, who owns it? Epic’s public framing places health systems “in the driver’s seat.” That is a positioning statement, not a governance document. No published contract language, terms of service excerpt, or named executive statement specifies who bears liability for agent-generated errors. No Epic accountability framework for self-built agents has been published. KPMG’s Q4 AI Pulse Survey (2025) found that 75 per cent of large-enterprise leaders name security, compliance, and auditability as their top requirements for agent deployment. At present, the answer to the pager question is that nobody has publicly claimed the call.

 

The Proof

Curiosity carries published research behind it: a preprint on arXiv first submitted in August 2025, covering 118 million patients and 151 billion tokens via the CoMET architecture. That is a meaningful evidential bar. Agent Factory has no equivalent published validation. Epic’s self-reported statistic that more than 85 per cent of customers are actively using Epic AI is plausible given market penetration of 43.7 per cent of US hospitals by count and 56.9 per cent by beds, but it refers to the existing suite, not to Agent Factory specifically. No performance benchmarks, error rate thresholds, or clinical outcome commitments for health-system-built agents on Agent Factory appear in any public source.

 

Verdict

If Epic publishes a clear liability framework that specifies what health systems own when they deploy self-built agents, and pairs that with a safety review mechanism before clinical agents go live, Agent Factory could become the defining infrastructure layer of hospital AI over the next decade. The foundation is genuinely strong: real outcome data from deployed agents, a clinically substantiated foundation model, and a market position that no competitor can easily replicate. The Curiosity publication demonstrates that Epic is capable of meeting an external evidential standard. The question is whether it applies that same rigour to the governance scaffolding around Agent Factory before health systems start building in earnest, rather than after the first serious incident forces the issue.

You Didn’t Transform. You Digitised

Most organisations that have spent the last five years claiming digital transformation have not transformed anything. They have taken broken processes, outdated thinking, and dysfunctional ways of working, and moved them online. That is not transformation. That is digitisation with a better slide deck. And the reason it keeps happening is not technology. It is not budget. It is not even capability. It is the fact that real transformation is genuinely uncomfortable, and most leaders are not willing to do what it actually requires.

 

The Lie We Have Been Telling Ourselves

Somewhere along the way, the industry decided that transformation meant deploying new platforms. Move to the cloud. Implement the ERP. Launch the patient portal. Go live by Q3. And when the system went live, someone in the boardroom called it a success.

McKinsey’s research, tracking digital transformation outcomes across more than 1,500 executives globally, found that fewer than 30% of digital transformation programmes achieve their stated goals. When the definition of success is tightened to organisations that both improved performance and sustained those improvements over time, the figure drops to 16%. The transformation was declared. The programme was closed. The leadership team moved on. And the results did not follow.

The same pattern is now playing out in artificial intelligence investment. Organisations are deploying AI tools at pace, adding automation to existing workflows, and calling the outcome transformation. The underlying question, whether the organisation has genuinely changed how it thinks, decides, and operates, goes unasked. The tools change. The organisation does not.

What actually happened on the ground: the same approval bottlenecks that existed in the paper process existed in the digital one. The same data quality problems that plagued the spreadsheet now plagued the database. The same people who did not trust each other before the system launched still did not trust each other after it. The technology arrived. The transformation did not. Because transformation was never on the project plan.

 

What You Actually Did

MIT researchers studying digital capability across more than 400 global organisations identified four categories of digital maturity. At the top: Digital Masters. High investment in technology, high investment in leadership and operating model transformation. Consistent outperformers.

At the bottom of the performance curve: what the researchers called “digital fashionistas.” High technology investment. Low operating model and leadership change. They look like digital leaders. They have the tools, the platforms, the dashboards, and the announcements. They consistently underperform the organisations that did both. The research, published in Leading Digital (Westerman, Bonnet and McAfee, Harvard Business Review Press, 2014), found that what separates genuine digital leaders from organisations that merely digitise is not technology investment. It is the depth of change to operating model and leadership capability that sits alongside it.

The fashionista is not a reckless organisation. It is a capable one that solved the easier half of the problem. Technology procurement has clear timelines, visible outputs, and measurable spend. You can point to it in a board presentation. Changing how an organisation makes decisions, how it tolerates uncertainty, how it deploys talent, how it responds to what customers actually do rather than what the strategy assumed they would do, that work is slower, harder, and less photogenic. So the easy half gets done. The hard half gets deferred. And the deferral becomes permanent.

Digitisation has real value. I am not dismissing it. But it does not change what is possible. It does not challenge why a process exists in the first place. It does not ask whether the workflow serving the organisation in 2010 should still be serving it today.

I have walked into healthcare systems where clinicians were still duplicating data entry across three platforms because no one had the political will to consolidate them. I have seen government programmes where the digital portal replicated a form-filling exercise that should have been eliminated entirely. I have watched organisations spend eight figures on enterprise systems and then rebuild their old spreadsheet workarounds alongside them, because the system did not fit how people actually worked, and no one was willing to change how people actually worked. New technology. Old thinking. Zero transformation.

 

Why Real Transformation Is Harder Than Anyone Admits

Consulting firm BCG surveyed 825 senior executives on their digital transformation experience. Approximately 70% reported falling short of the value they expected. The consistent pattern in that data, and in the broader body of research on transformation failure, is not a technology shortfall. The technology largely worked. What did not work was the organisational and cultural infrastructure around it. Organisations deployed new capability into old structures. New tools into old decision-making patterns. New data into organisations that did not know how to act on it.

Genuine transformation requires something that technology cannot deliver and no vendor will sell. It requires leaders to look at the way their organisation functions and be honest about what is not working, not just inefficient, but fundamentally wrong. Wrong structures. Wrong incentives. Wrong assumptions baked into processes that have never been questioned because they have been there too long for anyone to remember why.

That conversation is threatening. It implicates decisions made by people still in the room. It requires dismantling things that gave people power, status, or comfort. It means telling parts of the organisation that the way they have worked for a decade is the problem, not the solution. Most leaders are not willing to have that conversation. So instead, they commission a technology programme and call it transformation. It feels like action. It produces visible outputs. And it avoids the harder truth entirely. The technology becomes the distraction from the real work.

 

The Questions That Would Actually Change Something

Real transformation starts before any platform is selected, any vendor is appointed, or any project plan is written.

It starts with questions most organisations never ask.

Why does this process exist? Not how does it work, but why does it exist? What problem was it designed to solve, and is that still the problem we have?

Who benefits from keeping this the way it is? Because in every organisation, there are people whose influence depends on information asymmetry, manual steps, or processes that only they understand. Digital transformation threatens that. And those people will, consciously or not, find ways to make sure it does not fully land.

What behaviour needs to change, not just what system needs to be replaced? Because if that question cannot be answered before go-live, the transformation will fail after it.

What are we willing to stop doing? Every genuine transformation requires eliminating something. A process, a role, a way of making decisions. If nothing has been stopped, nothing has been transformed.

 

The Leader’s Role Nobody Talks About

This is where most transformation discourse goes quiet.

Because the answer to why transformation fails is almost always leadership. Not IT leadership. Not programme leadership. Senior organisational leadership.

The leaders who delegated transformation to a project team and checked in quarterly. The ones who approved the technology investment but never showed up to the change management conversation. The ones who said they needed to transform in the all-hands and then protected every structural thing that made transformation impossible.

Transformation cannot be delegated. Implementation can. But the decisions that actually change an organisation, who has authority, how work flows, what gets measured, what behaviour gets rewarded, those decisions sit at the top. When leadership avoids them, the project team delivers what they can. They go live. They hit their milestones. And the organisation looks digitised, not transformed.

 

What Transformation Actually Looks Like

I have seen it done well. Not often, but I have seen it.

It looks like a leader standing in front of their organisation and naming the real problem, not the technology gap, but the cultural or structural one underneath it. It looks like decisions being made that upset people, because those people were benefiting from the dysfunction. It looks like processes being eliminated, not just automated. It looks like the technology arriving last, after the hard thinking has already been done, as an enabler of a new way of working, not a substitute for designing one.

It is slower than digitisation. It is harder to measure. It produces fewer milestone celebrations. But two years later, the organisation actually works differently. Not just faster. Differently.

 

The Uncomfortable Question

If you are sitting with a transformation programme right now, in progress, recently completed, or about to start, ask one question.

What have we changed about how this organisation thinks, decides, and operates? Not what have we deployed. What have we changed?

If the honest answer is “not much,” you have not transformed. You have digitised.

And until someone is willing to say that out loud, the investment in transformation programmes will keep delivering digitisation results, and the question of why the return never arrived will keep going unanswered.

The technology was never the problem. It was always the thinking.

Pre-Mortem: The Pentagon’s Autonomous Drones Reset

 

The Pentagon’s Replicator programme promised thousands of cheap autonomous drones in two years and delivered hundreds. The response has not been to wind it down. It has been to dissolve it, rebuild it as a new command inside Special Operations Command, and ask Congress for roughly 240 times the money. A programme that under-delivered on a lean, fast model is being re-attempted on a vast one, and the case for why the second structure succeeds where the first did not has not yet been made in public.

A pre-mortem asks the same five questions, every time, applied to a current programme before failure is possible rather than after. This is the third in the series. The first looked at vendor accountability in regulated finance. The second looked at clinical safety accountability in regulated healthcare. This one looks at execution accountability in defence procurement, the hardest delivery environment of them all. Different sector, similar structural shape: commitment moving faster than the architecture meant to hold it to account.

 

The Bet

The bet is that scale fixes what speed could not. Replicator was announced in August 2023 with a target of multiple thousands of all-domain attritable autonomous systems inside roughly two years, run by the Defense Innovation Unit on about a billion dollars across two fiscal years. It was deliberately lean, built to route around the traditional acquisition machine. By the deadline it had fielded hundreds. The reset, the Defense Autonomous Warfare Group, carries a 2027 budget request of about $54 billion, against roughly $226 million the year before. The technical bet is sound on its face: mass autonomy is where warfare is going, and the United States cannot afford to be slow to it. The harder bet, the one sitting under the headline number, is that money and a command structure fix what was an execution problem. Those are different things, and the launch treats them as one.

 

The Assumption

One belief is doing all the work: that Replicator’s shortfall was a problem of resourcing and structure, solvable with more of both. The documented failures point elsewhere. Systems were selected that proved unreliable, too expensive, or too slow to manufacture at the quantities needed. Some existed only as a concept when they were chosen. And the programme could not procure software able to orchestrate and command large, mixed swarms of different drones, which is the actual technical heart of autonomy at scale. None of those is a budget problem. A bigger budget buys more of the same systems and more of the same integration gap. If the diagnosis is wrong, the cure scales the disease.

 

The Sequence

Commitment came before the architecture, again. Replicator launched in August 2023. A second line of effort, focused on countering small drones, was added by a Secretary of Defense memo in September 2024. The original thousands-by-2025 deadline arrived with hundreds delivered. The programme was then consolidated into a joint interagency task force, dissolved, and rebuilt as the new autonomous-warfare group inside Special Operations Command, with the first acquisition under the new structure landing in January 2026, two counter-drone systems. Only in April 2026 did the Secretary tell the House Armed Services Committee that a sub-unified command for autonomous warfare was coming. The command meant to own this is still being stood up around a commitment already made. The funding tells the same story. Of that $54 billion, only about $1 billion is appropriated base money. The other $53 billion is a request, parked in a flexible five-year reconciliation pot that Congress has not yet passed. The headline number signals overwhelming commitment. In hard terms it is roughly a billion dollars in hand and fifty-three billion in hope. The intention is real. The money, for now, is one dollar in every fifty-four.

 

The Pager

Start with the credit, because it is real. The new group has a named director, Lt. Gen. Francis L. Donovan (USMC), with a clear command line and an appointment made by the Secretary himself. That is more named, senior accountability than most large defence programmes ever put on the public record, and it counts for something. The harder question is operational and specific. Standing policy requires appropriate levels of human judgement over the use of force. At swarm scale, with attritable systems acting at machine speed, who is the named individual accountable when one of them engages wrongly? The command line is clear. The accountability for the autonomous decision itself, at the scale this programme is built to reach, has not been framed in public. A command answers for a programme. It is a harder thing to say who answers for a single autonomous engagement when there are thousands of them in the air.

 

The Proof

The committed measures are input measures. Dollars requested, units contracted, the first systems bought. There is no public outcome measure for capability actually delivered, no cost per effective intercept, no fielded-and-working-at-scale figure with a date attached. This matters because the proof problem already bit once. Leadership called Replicator on track in 2024 and said it had made enormous strides in 2025, while the independent accounting found hundreds, not thousands. When the people who own the programme also own the definition of progress, optimism outruns delivery. Second-attempt scepticism is earned, not unfair. In eighteen months, the question of whether this worked will be answered by whoever holds the platform to define what delivered at scale means, and right now that platform is a budget request.

 

Verdict

This is a serious programme with serious people behind it. The strategic logic is correct, mass autonomy matters and slowness is its own risk. The accountability has a name and a rank, which is rare. The first systems have been bought and are heading to the field. None of that is in doubt.

What is unproven is whether a command and a budget can fix a problem that was about manufacturing maturity, software orchestration, and realistic system selection. A reorganisation addresses none of those by itself.

The action is concrete. Publish the outcome measure, not the input: a fielded-and-working-at-scale metric with a date, committed before the reconciliation money is spent, not after. Name the human accountable for autonomous engagement decisions at scale, not only the command that owns the programme. And diagnose the first shortfall in public before scaling, so the much larger second bet rests on a corrected understanding rather than a hope.

If the department publishes a delivered-at-scale outcome measure tied to a named owner, and solves the swarm-orchestration software problem it could not solve the first time, this becomes the programme that proves autonomous capability can be fielded at speed. Without both, it becomes the most expensive way yet found to relearn that money and reorganisation do not fix an execution problem.

The Most Dangerous Status Report Is the One Everyone Is Comfortable With

 

The governance is running. The reports are flowing. The steering committee met on time, every question got a confident answer, and the pack looked clean.

And the programme is in more trouble than anyone in that room is prepared to say.

This is not an unusual situation. It is not a sign of dysfunction or dishonesty. It is, in my experience, the most common information environment in large-scale programme delivery. The data supports that reading. Research by Milliken, Morrison and Hewlin, published in the Journal of Management Studies, found that 85% of employees have withheld important information from their manager because they feared the consequences of speaking up. The fear is not of formal punishment. It is relational: the fear of being seen negatively, of damaging a relationship, of being labelled someone who creates problems rather than solves them.

This is not a minority behaviour. It is the default.

The question is not whether a filter exists on your programme. The question is how thick it has become, and whether you would know.

 

Nobody Decides to Build the Filter

The pattern I have watched play out more times than I can count begins with a capable, experienced leader who genuinely means what they say. They have told the team, in kick-offs and town halls and one-to-ones, that they want to hear the bad news early. They are not performing openness. They believe it.

But then someone raises a concern in a steering committee and the leader’s body language shifts before the words are out. A risk gets flagged and the first question is why it was not caught earlier rather than what needs to happen now. A project manager delivers a difficult update and spends the following week under a level of scrutiny that has nothing to do with fixing the problem.

Nobody announces a new policy. Nobody says: do not bring me bad news.

But the room notices. Every single time.

And slowly, without anyone deciding to do it, the filter gets built. The team learns which concerns land well and which ones create friction. They learn how to frame things to reduce the emotional temperature in the room. They learn the difference between the truth and the version of the truth that keeps the meeting moving and their professional standing intact.

The updates keep arriving. The reports keep flowing. The governance keeps running.

But the signal has been stripped out. What remains is noise dressed up as information.

 

You Do Not Build This Through Negligence

This is the part that most leadership development will not tell you directly. You do not build a closed information environment through negligence. You build it through a series of entirely human, entirely understandable responses to difficult moments.

A flash of impatience when a problem arrived at the wrong time. A habit of moving to solutions before the problem is fully understood. A preference, however subtle, for the reassuring narrative over the complicated one.

These are not character flaws. They are instincts under pressure.

But at leadership level they are not private. The CIPD’s 2024 evidence review on psychological safety identifies leader and manager behaviour as the most critical driver of whether people feel safe to speak up, and specifically notes that what matters is not what leaders say about wanting honesty, but what they demonstrate through their actions when honesty arrives. The research is unambiguous: psychological safety is fragile. A single punitive response to good-faith feedback can damage trust that took months to build.

Every reaction is observed, interpreted, and factored into how safe it feels to tell you the truth next time. The leader who says they want honesty but visibly struggles to receive it is not running an open culture. They are running an organisation that has learned to give them what they can handle rather than what they need.

The most dangerous status report is not the one with red items on it.

It is the one everyone is comfortable with.

 

Four Practical Moves

The filter is not permanent. It is a learned behaviour, and learned behaviours can be unlearned. But reversing it requires something more specific than an open-door policy.

Stop asking questions that invite the managed answer. “How are things going?” will get you the curated version every time. Try instead: what is the one thing you would not put in a status report but think I should know? If this programme were going to fail, what would the early sign look like? What are we not talking about that we probably should be? Those questions signal that you are interested in the reality, not the performance of it.

Go to where the real work is happening. Not to inspect. To listen. The people closest to delivery carry an understanding of programme health that rarely makes it into formal reporting. A single honest conversation with a delivery lead or a technical team that has been carrying a quiet problem for weeks will tell you more than three months of steering committee updates.

Create a visible moment where surfacing difficulty is rewarded rather than merely tolerated. When someone raises something uncomfortable and your public response is genuine appreciation followed by a real conversation about what to do next, the entire room recalibrates what is safe to say. One moment like that shifts the culture more than any open-door policy ever will. The inverse is equally true: one moment where the messenger suffers sets the filter back months.

Learn to read silence as data. The steering committee where every question gets a confident answer. The risk log that has not changed in three weeks. The team that delivers polished updates but never raises anything unexpected. These things can mean a programme is running well. They can also mean the filter is fully operational and the real conversation is happening somewhere else entirely. If nobody is telling you anything that surprises you, that is not necessarily a sign that everything is on track. It may be a sign that you have stopped being the kind of leader people bring hard news to.

 

The One Question That Cuts Through

There is a question I now use when a programme looks clean but feels wrong.

I find someone close to the real work. Someone who has been there long enough to know where the bodies are buried. And I ask them one thing.

What does everyone here know that nobody is saying out loud?

The answer to that question is almost always where the programme actually is. The gap between that answer and what appears in the formal reporting is almost always where the real leadership work needs to happen.

 

Comfortable Information Is Borrowed Time

PMI’s research on complex programme delivery is consistent on this point: early warning signals are frequently present and frequently ignored, causing problems to compound in severity before they are addressed. The pattern is not exceptional. It is systematic.

Every week a real problem stays hidden is a week where the options for addressing it narrow. Manageable risks become serious ones. Recoverable situations become critical ones. And the longer the filter operates, the more the team’s trust erodes, because people who know the truth and watch it go unacknowledged eventually stop believing that leadership is operating in good faith.

When the programme finally tells you the truth, and it always does eventually, the question is rarely how to get back on track.

It is whether getting back on track is still possible.

 

The Leaders Who Get This Right

The leaders who consistently deliver in high-stakes environments are not always the most experienced or the most technically skilled.

But they share something that is harder to develop than either of those things. They have learned to want the truth more than they want to be comfortable. They have built the self-awareness to notice when they are receiving a managed version of reality, and the discipline to go looking for the unmanaged one. They have created environments where people bring problems early because they have learned, through consistent experience, that doing so leads somewhere useful.

That is not a natural state for most leaders. It requires sustained effort, genuine self-awareness, and a willingness to sit with difficult information and resist every instinct to make it someone else’s problem.

But the alternative is a programme that looks healthy until it does not. A team that has learned to give you what you can handle. A steering committee that runs on time and misses everything that matters.

When your team tells you how things are going, are they telling you what is happening?

Or are they telling you what they have learned you can live with?

The gap between those two answers is where most programmes are won or lost.

Why Western Delivery Frameworks Stall in the Middle East (and What to Do Instead)

 

I remember sitting across from a senior government official in the Gulf, about six weeks into a major transformation programme. On paper, everything was moving. The governance framework was in place. The workstream leads had been assigned. The project plan had been reviewed and signed off. The first steering committee had gone smoothly.

And yet nothing was actually happening.

Not because of incompetence. Not because of a lack of resources. Not because the methodology was flawed. The team I was working with was experienced, capable, and well-intentioned. But they had arrived with a delivery model built for a different context, and they were applying it with the confidence of people who had never had reason to question it.

That pattern is playing out at an extraordinary scale right now. Saudi Arabia’s ICT market surpassed $48 billion in 2024, the largest technology market in the Middle East. McKinsey’s 2025 State of AI in GCC Countries report, drawing on surveys of senior GCC executives, found that 84% of GCC organisations have adopted AI in at least one business function, and only 31% have successfully scaled or fully deployed across the organisation. That is a 53-percentage-point gap between starting and delivering, across some of the best-funded, most ambitious transformation programmes in the world.

The question is not why organisations in the region are investing. The question is why so much of that investment stalls between intention and outcome.

After more than a decade delivering in the Middle East, I think I know the answer. And it is not the one most people reach for.

 

The Assumption That Travels Badly

Western delivery frameworks, whether PRINCE2, PMI, SAFe, or the various proprietary methodologies that large consulting firms carry from engagement to engagement, are not neutral tools. They are cultural artefacts. They were built in specific organisational contexts, shaped by particular assumptions about how decisions get made, how accountability flows, how disagreement is handled, and what progress looks like.

Those assumptions are rarely stated explicitly. They do not need to be, in the environments where these frameworks were designed. Everyone in the room already shares them. But the moment you move those frameworks into a fundamentally different cultural context, the unstated assumptions become the problem.

Hofstede’s cultural dimensions research (now maintained by The Culture Factor Group) offers a useful lens here. Arab countries consistently score high on two dimensions that bear directly on programme delivery. The first is power distance: the degree to which authority is respected rather than openly challenged, and the degree to which the most senior voice shapes the room rather than the most technically accurate one. The second is uncertainty avoidance: a preference for predictability, a resistance to ambiguity, and a reluctance to surface risk that might destabilise a process already formally endorsed. These are not character flaws or cultural limitations. They are consistent patterns that predict specific delivery behaviours imported frameworks are simply not designed to manage.

The typical Western delivery model assumes a relatively flat decision-making structure where the person with the most relevant expertise speaks most loudly. It assumes that challenge and disagreement in a meeting are signs of healthy engagement rather than disrespect. It assumes that formal sign-off is the meaningful moment of commitment and that what is agreed in the room will be actioned after it. It assumes that timelines create accountability and that accountability creates action.

In the Middle East, several of those assumptions do not hold. And the teams that arrive without understanding this do not fail because they lack capability. They fail because they are solving the wrong problem.

 

What Actually Drives Delivery in This Region

Execution in this region is not primarily operational. It is relational.

This is not a cultural curiosity or a soft consideration to be acknowledged in a pre-departure briefing and then set aside. It is a delivery requirement. Understanding it, genuinely understanding it rather than paying lip service to it, is the difference between a programme that moves and one that generates activity without progress.

Decisions in many Middle Eastern organisations, particularly in government and quasi-government entities which dominate the regional landscape, do not flow through the formal governance structure in the way a Western framework assumes. The steering committee may ratify decisions, but the real alignment happens elsewhere. In relationships that have been built over time, in conversations that take place outside the formal meeting structure, in the space between hierarchy and trust that no project plan captures.

Hierarchy here is not an obstacle to navigate around. It is the delivery infrastructure. Understanding who the real decision-makers are, what they care about, how they receive information, and what kind of relationship needs to exist before they will move is not supplementary to the delivery approach. It is the delivery approach.

Equally, the pace at which genuine commitment forms is different. A Western programme manager reads a signed-off plan as a committed baseline. In many regional contexts, that same sign-off is closer to the beginning of a conversation than the end of one. Real commitment, the kind that produces action, is built through repeated engagement, demonstrated respect, and a track record of following through. It cannot be manufactured by a governance process, however well designed.

 

The Meeting That Agrees to Everything and Changes Nothing

One of the most consistent patterns I encounter when stepping into stalled programmes in the region is what I have come to think of as performative alignment. The meetings happen. The presentations are well received. The heads nod. The action items are recorded. And then the follow-through does not come, not because anyone has decided not to cooperate, but because the alignment that appeared to exist in the room was not the deep kind that produces action.

In high-context cultures, of which many in the Middle East are clear examples, direct disagreement in a formal meeting setting carries a social cost that most Western delivery professionals underestimate. Saying no to a proposal in front of a room of peers and seniors is not simply a professional difference of opinion. It can feel like a breach of the respect and harmony that the meeting is partly there to maintain.

The result is that concerns, reservations, and genuine blockers often do not surface in formal governance forums. They emerge later, in quieter conversations, or they do not emerge at all. The experienced regional delivery leader learns to read what is not being said in a meeting as carefully as what is. The silence after a proposal is not always agreement. The smooth meeting is not always a sign of progress.

Teams that do not understand this dynamic spend months wondering why a programme that looked aligned keeps stalling. The answer is usually that the alignment they measured was the visible kind, and what drives delivery here is the invisible kind.

 

Where International Teams Get It Wrong

The failure mode I see most consistently is not incompetence or arrogance, although both exist. It is the application of a known model to an unknown context, with too much confidence and too little curiosity.

The international team arrives. They bring the framework, the templates, the governance structure, the reporting cadence. They run the kickoff. They establish the workstreams. They hold the first set of meetings. Everything looks like it is moving. The client counterparts are polite, engaged, and apparently aligned.

Then the programme slows. Decisions that should take days take weeks. Approvals that seemed close keep getting deferred. Stakeholders who appeared committed become harder to reach. The team escalates. They add more governance. More reporting. More pressure. The programme slows further.

What they are experiencing is the consequence of building delivery infrastructure without first building relational foundations. They have a governance model without trust underneath it, and governance without trust is just paperwork.

The other common failure is treating local counterparts as recipients of the methodology rather than as partners in understanding the context. The best people in any regional organisation carry an understanding of how things actually work, who the real influencers are, where the genuine blockers sit, and what has been tried before and why it did not land. Engaging that knowledge seriously, rather than as a courtesy, would transform the delivery approach. Most international teams access about ten percent of it.

 

What to Do Instead

The answer is not to abandon rigour or to conclude that structured delivery does not work in the region. It does. But it works differently, and the sequence matters enormously.

The first investment, before the governance framework, before the project plan, before the first steering committee, is in relationships. Not networking in the transactional sense. Genuine relationship-building, rooted in curiosity and respect, that creates the conditions under which honest conversation becomes possible. In a region where trust precedes transaction rather than following it, the time spent on this is not a delay to delivery. It is the foundation of it.

The second shift is in how decisions are understood and pursued. Rather than designing a governance structure and expecting decisions to flow through it, the experienced regional programme leader maps the real decision-making landscape. Who are the individuals whose genuine endorsement will move things? What do they need to see, hear, or feel before that endorsement becomes real? What informal conversations need to happen before the formal ones? Answering those questions honestly, and building a relationship strategy around them, is more valuable than any governance framework.

The third adjustment is in how alignment is tested. Rather than reading smooth meetings and nodding heads as confirmation of commitment, effective regional delivery leaders build in deliberate mechanisms for surfacing the real picture. Private conversations with key counterparts after formal sessions. Trusted intermediaries who can carry honest feedback in both directions. An explicit understanding that the formal meeting is often where positions are displayed rather than where they are formed.

Fourth, the pace of delivery needs to be calibrated to the pace at which genuine alignment forms, not the pace that the project plan demands. This is uncomfortable for Western programme managers trained to treat a timeline as a commitment. But the cost of false pace, the appearance of movement without the substance of it, is far higher than the cost of taking the time to build the real thing.

Finally, and perhaps most importantly, local knowledge must be treated as a strategic asset rather than a logistical courtesy. The people who understand the organisation, the culture, the history of what has been attempted before, and the unwritten rules that govern how things actually get done are the most valuable resource on the programme. Structuring the delivery approach around that knowledge, rather than around the imported framework, is the shift that most changes outcomes.

 

The Deeper Lesson

Delivering in the Middle East has taught me something that has made me a better programme leader everywhere, not just in the region.

Context is not a complicating factor. It is the medium through which all delivery happens. Every organisation has its own version of the unwritten rules, the informal power structures, the historical sensitivities, and the cultural patterns that shape how work actually gets done. The Middle East makes these visible in ways that Western environments sometimes obscure, because the gap between the imported model and the local reality is large enough that you cannot ignore it.

But the principle is universal. The best delivery professionals I have worked with anywhere in the world are the ones who arrive with curiosity before they arrive with answers. Who treat understanding the environment as the first act of delivery, not a preliminary to it. Who know that a framework is a starting point, not a solution.

The frameworks that travel well are not the ones with the most sophisticated methodology. They are the ones held lightly enough to be adapted, by people self-aware enough to know when adaptation is what the moment requires.

 

A Final Thought

If you are leading a programme in the Middle East and it has the shape of movement without the substance of it, the instinct will be to add more structure, more governance, more reporting, more pressure. That instinct is usually wrong.

The question to ask instead is simpler and harder. Do the people who need to move this programme forward trust you enough to tell you what is actually in the way? Have you built the relationships that make honest conversation possible? Do you understand not just the governance landscape but the human one?

If the answer to any of those questions is uncertain, that is where the work is. Not in the framework. Not in the plan.

 

Pre-Mortem: Anthropic’s Wall Street Agentic AI Suite

 

Thirteen of the world’s largest financial institutions just deployed ten autonomous AI agents into the most regulated workflows in finance. None of them has publicly named who is accountable when the agents are wrong. Not the banks. Not the vendor. Not the regulators. The launch on 5 May reads like a milestone. Read closer and it reads like a stress test of every governance assumption the financial services industry operates on.

A post-mortem tells you why something failed once it already has. A pre-mortem asks the same questions before failure is possible. Same five questions, every time, applied to a current programme, announcement, or initiative. This is the first in the series, and the subject is not chosen by accident. The Anthropic Wall Street launch is the clearest example I have seen this year of capability racing ahead of the architecture meant to hold it to account. If you are a CIO, a CRO, or a transformation lead in a regulated industry, the lessons here apply to you whether you are deploying Claude or not.

 

The Bet

Anthropic and the deploying banks are betting that ten autonomous agents can land in the most regulated workflows in finance, underwriting, KYC, credit memos, statement audits, faster than the regulatory architecture can constrain them. The technical bet rides on Claude Opus 4.7’s 64.37% on the Vals AI Finance Agent benchmark and AIG’s quoted 88% accuracy on insurance claims out of the box. The strategic bet is that being first at this footprint, including JPMorgan Chase, Goldman Sachs, Citi, AIG, BNY, Carlyle, Mizuho, and Visa, outweighs whatever comes back from regulators in the next twelve months. Reasoned bets, made by an extraordinarily capable vendor and the most sophisticated buyers in the world. But they are bets, not certainties, and the launch reads as certainty. The CIO of any one of those banks is taking on operational, regulatory, and reputational risk for which the vendor has accepted no published share. That is the bet they should be examining most carefully.

 

The Assumption

One belief is doing all the work: that bank operating models can absorb ten simultaneously deployed agents without the human-in-the-loop quietly thinning where the agents prove reliable. Anthropic’s own commitment depends on it, from the primary announcement: “Users stay firmly in the loop, reviewing, iterating on, and approving Claude’s work before it goes to a client, gets filed, or is acted on.” The history of automation in regulated environments tells a different story. Algorithmic trading kill switches were not triggered because the system was performing. Automated underwriting reviews became rubber stamps once approval rates looked normal. Every automation failure in regulated finance follows the same arc: human oversight erodes invisibly as the system proves itself, and the erosion is only visible after the failure. JPMorgan CIO Lori Beer said it directly at the launch: “The technology can do so much. It’s the actual organization’s ability to digest and absorb it.” That ability is the load-bearing assumption. If it holds, the launch is a milestone. If it does not, the launch is a slow-moving incident.

 

The Sequence

Capability shipped. Ten named agents, Microsoft 365 generally available, Moody’s embedded, more than a dozen banks in production. What was committed before the operational governance for vendor-supplied agentic decisioning was published: all of it. Three weeks earlier, the Fed and the OCC revised Model Risk Management guidance and explicitly excluded agentic AI as “novel and rapidly evolving.” A Request for Information is planned, with no committed timeline. The EU AI Act’s high-risk financial-sector requirements take effect 2 August, twelve weeks after launch. The FCA and PRA decided against creating a dedicated AI Senior Management Function and instead mapped accountability onto existing SMFs that were never designed with autonomous agents in mind. Three jurisdictions. Three different gaps. One vendor launch landing in all of them at once. This is not a regulator being slow. This is a regulator explicitly stating that the rules do not yet apply, while the systems the rules are meant to govern are already in production.

 

The Pager

The banks have named regulatory accountability at the firm level. SMF24 (Chief Operations), SMF4 (Chief Risk Officer), SMF16 (Compliance Oversight) at FCA and PRA-regulated firms hold statutory responsibility for technology, risk, and compliance. Model risk owners at US firm level cover the same ground. Real, senior, public. That deserves credit. However, none of them have been publicly named for the deployment of these specific agents. Inheriting accountability through a job description is not the same as being named as the accountable owner of a programme. The first is the regulatory default. The second is what serious AI governance actually requires. Anthropic has no published vendor accountability commitment for autonomous regulated decisioning. The asymmetry is the entire story. When a Claude-built agent denies a loan that should have been approved, or approves a KYC file that should have been escalated, the pager rings at the bank, with consequences for the bank, while the vendor’s exposure is contractual and capped. The clearest demonstration came six days before the launch itself. On 29 April, Goldman Sachs removed Claude access for its Hong Kong bankers over contractual, regulatory, and geopolitical factors. The bank pulled the product. The vendor did not pull itself out. Whoever absorbs the cost when regulatory fit fails, absorbs it alone. Until vendor accountability is publicly framed, every bank deploying these agents is underwriting risk the vendor will not.

 

The Proof

Two outcome measures have been published. 64.37% on Vals AI. 88% on AIG insurance claims out of the box. Both are useful. Neither measures regulated-decision accuracy at scale. There is no committed measure for customer-detriment rate, near-miss frequency, incident reporting cadence to regulators, or the rate at which human reviewers actually amend agent outputs versus rubber-stamp them. The banks deploying these agents do not yet have public outcome commitments either, and that absence is its own answer. Former CFO Alyona Mysko captured what is at stake: “In finance, 99% correct is still wrong.” In eighteen months, the question “did this work?” will be answered by whoever owns the platform to define what work means. Right now, that platform is the vendor’s marketing. The banks need to claim that platform back, in their own outcome language, before the metric is set by a third party with no skin in their game.

 

Verdict

The launch is genuinely significant. More than a dozen named banks in production, industry-leading benchmark performance, audit logs in the Claude Console, the deepest Microsoft and Moody’s integrations any AI vendor has shipped. None of that is in dispute.

What is in dispute is whether the deploying banks have done the work to fill the accountability gap that the vendor has not closed and the regulators have not yet defined. The lesson generalises beyond Anthropic and beyond banking. Any CIO buying agentic AI in a regulated industry, healthcare, insurance, energy, the public sector, is operating in the same gap, and most have not yet noticed.

The action is concrete. Name the human in your organisation who carries the pager when the agent is wrong. Demand a vendor accountability schedule before you sign, not after. Define your own regulated-decision outcome measure and publish it, so the standard your performance is judged against is one you helped set.

If Anthropic publishes a vendor accountability commitment in the next six months, and a major bank commits to a public regulated-decision outcome measure tied to a named owner, this becomes a case study other industries will study for years. Without both, it becomes the most expensive procurement lesson the industry buys this decade.