Your AI Risk Register Does Not Reflect Your Actual Risk

 

On 22 June 2026, the intelligence agencies of the United States, United Kingdom, Australia, Canada, and New Zealand spoke in a single voice about enterprise AI risk, and what they said demands attention.

The Five Eyes cybersecurity agencies issued a joint statement warning that frontier AI models are improving at a pace that will allow them to bypass prevailing enterprise cybersecurity defences within months. Not within years. Not in the next planning cycle. Within months. The statement’s own language: “The timeline is not years, it is months.”

 

This Is Not an Abstract Warning

Joint statements from the Five Eyes agencies carry a different category of authority than vendor advisories or consultancy threat reports. These are national intelligence services with access to classified threat intelligence, speaking to government and enterprise leaders simultaneously. When they frame a risk as both imminent and enterprise-specific, take it at face value.

What sets this advisory apart from every AI security conversation most enterprises have been having is one thing: specificity. The Five Eyes statement does not describe abstract AI risks. It specifically names the enterprise AI tools deployed at scale in the last 18 months: copilots, AI assistants, browser-connected agents, and systems with access to operational and customer data. The primary attack mechanism, developed across Five Eyes guidance published earlier this year, is prompt injection: an adversary embeds hidden instructions in content the AI system processes, causing it to act outside its intended scope.

That specificity matters. It means the tools that most large enterprises have already deployed are the attack surface being described.

 

The Threat Moved Faster Than Your Review

Most organisations that have rolled out AI copilots, enterprise agents, or browser-integrated assistants have conducted security reviews of those deployments. The Five Eyes advisory is not questioning whether those reviews happened. It is saying that the threat has moved faster than the defences, and that a review conducted six months ago may no longer accurately reflect the risk profile today. The gap is not in intent. It is in elapsed time against a threat that has not stood still.

The advisory is explicit that this is not solely a security-team problem. The statement directs its recommendations at leadership, framing AI-driven cyber risk as a governance and board-level accountability question. The statement’s own title: “The AI shift in cyber risk: why leaders must act now.” That framing has direct implications for how risk registers are built and how AI deployment decisions are reported to boards.

 

Three Things Worth Doing Before Your Next Board Meeting

The advisory points to three things transformation leaders should act on before their next board meeting.

The first is a current security review. Every AI deployment connected to operational data, whether customer records, financial systems, or internal communications, needs a review that specifically addresses prompt injection risk. Not the review conducted at go-live. A current one, calibrated to the threat capability the Five Eyes describe as arriving within months.

The second is an updated risk register. Most enterprise risk frameworks assessed AI security risk at the point of initial deployment. The Five Eyes advisory says the threat environment has changed materially in the months since, and the assessment needs to reflect current threat capability rather than historical assumptions. An outdated risk assessment is not a minor administrative gap at this point. It is a governance exposure.

The third is using the advisory to reframe the conversation at board level. Six cybersecurity agencies from five countries issued this statement with an explicit focus on business leadership. That gives transformation leaders the instrument they need to move boards that have been treating AI security as an implementation detail. The Five Eyes advisory makes it a governance question. Use it as one.

The AI deployment decisions taken in the last 18 months created an attack surface. Most enterprise risk registers have not yet priced what that surface is worth to an adversary with AI-powered attack tools that are months from bypassing prevailing defences. That gap needs to close, and it closes with a current assessment, not one accurate at the time of go-live.

The EU AI Deadline Your Compliance Team Probably Missed

The EU AI Act enforcement date most organisations have been tracking is not 2 August 2026. They have been watching the high-risk provisions, the conformity assessments, the prohibited applications. Those timelines stretch into 2027 and beyond, and enterprise compliance teams have planned accordingly.

Article 50 has a different clock. It takes effect in 31 days, it applies to a far wider population of organisations than most realise, and for most of its obligations there is no grace period.

 

Not the Regulation You Were Watching

For the past two years, enterprise AI governance conversations have centred on the Act’s high-risk classifications. Which systems require conformity assessments? Which use cases are prohibited outright? The questions were legitimate, and the extended timelines attached to those provisions created a reasonable sense of runway.

That runway does not apply to Article 50.

Article 50 covers transparency obligations, and it lands on 2 August 2026. It requires any organisation deploying customer-facing AI systems to disclose to users that they are interacting with an AI. It requires providers of generative content tools to implement machine-readable marking on AI-generated outputs. Operators running emotion recognition or biometric categorisation systems must notify the individuals affected. And for any new system entering the EU market on or after 2 August, compliance is required from day one.

One aspect of the regulation that most compliance programmes have not fully processed: Article 50 is not jurisdictional. Article 50 follows the user, not the provider. That is how the Act defines its own scope. A company headquartered in Dubai, Singapore, or New York that deploys AI-generated content visible to EU users is in scope. Where the output lands determines the obligation. The practical consequence is that Article 50 applies to any organisation with a customer base that includes EU residents, regardless of where that organisation is incorporated or where its AI systems are built and operated.

The organisations that will be caught short are not the ones building prohibited systems. They are the ones that assumed the regulation was still in the planning stage, or that it would only apply to organisations based in Europe.

 

The GDPR Comparison That Matters

GDPR was announced in 2016 and took effect in 2018. Two years of awareness campaigns, legal seminars, board-level briefings, and vendor remediation work. The compliance industry built an entire ecosystem around it. Privacy officers were hired. Data mapping exercises ran for months. By the time enforcement began, organisations at least understood what was expected of them, even if some were still catching up.

GDPR also reached beyond EU borders from the start. Any organisation processing the personal data of EU residents was in scope, regardless of where it was based. Article 50 operates on the same principle: it reaches wherever EU residents are on the receiving end of AI-generated content or AI-driven interactions.

Article 50 does not have that context. Most enterprise compliance functions have been tracking the Act’s overall timeline without separating out which provisions take effect when. The transparency obligations were not deferred. They were always scheduled for August 2026. But because the high-risk provisions dominated the conversation, the transparency rules arrived quietly, and they arrive soon.

Thirty-one days is not a planning horizon. It is an implementation sprint, or it is already a compliance gap.

 

What Article 50 Actually Requires

The obligations are more specific than the general framing of “AI transparency” suggests, and that specificity matters for scoping the work.

The most broadly applicable obligation is disclosure. If a user is interacting with a chatbot, a virtual assistant, or any automated system capable of conversation or personalised response generation, they must be told. The requirement is not a buried terms-and-conditions clause. It is a functional disclosure at the point of interaction. This applies from 2 August, to all systems, with no transitional provisions.

Generative content carries a second obligation. Organisations using generative AI to produce content distributed in EU-market contexts must ensure outputs carry machine-readable markers indicating AI generation. This applies to text, images, audio, and video. The AI Omnibus agreement provisionally agreed in May 2026 and expected to be formally adopted before 2 August extends this specific requirement to 2 December 2026 for systems already on the market before 2 August. For any new system entering the market from that date, the obligation is immediate. The extension is not a signal to deprioritise: December 2026 is not far away, and the technical implementation is not trivial.

Emotion recognition and biometric categorisation carry a third obligation, active from 2 August with no transitional period. Individuals must be informed when these systems are operating on them.

None of these obligations are complex in isolation. The difficulty is that most organisations have not mapped which of their current systems fall within scope, and that mapping exercise takes longer than 31 days when it is starting from scratch.

 

What to Do in the Next 31 Days

Non-compliance carries fines of up to €15 million or 3% of global annual turnover, whichever is higher. This is not a planning conversation. It is a board conversation.

Article 50 requires operational change: disclosure mechanisms built into interfaces, technical markers implemented in content pipelines, notification processes embedded in operational workflows. A policy document does not close this gap.

The practical starting point is a scoping exercise, and it needs to happen this week, not at the end of July. Three questions define the scope: Which customer-facing systems use AI in any form of interaction or response generation? Which content production workflows use generative AI to produce material distributed in EU-market contexts? Are any systems using emotion recognition or biometric categorisation?

If the answer to any of those questions is yes and the disclosure or notification mechanism is not already live, that is an Article 50 compliance gap.

Once the scope is clear, triage by exposure. Not every system carries the same risk. Externally facing consumer products in regulated sectors carry a different risk profile than internal productivity tools. Sequence the remediation by audience, jurisdiction, and volume of interaction.

Confirming the mechanisms actually work is where most programmes get caught. A disclosure notice that technically exists but is not surfaced at the point of interaction does not satisfy the requirement. The same applies to machine-readable markers that are added to some content outputs but not systematically applied across all generative workflows. Implementation is not the same as compliance.

 

31 Days Is Not a Problem. 32 Days Is.

There is still time to close this gap for organisations that act now. August 2026 is not GDPR day one, when regulators were finding their feet. It is an enforcement event in a regulatory framework that has had two years of published timelines. Regulators will not be looking the other way.

The organisations that treated the high-risk provisions as the whole story now have 31 days to correct that assumption. Wherever they are based.

Pre-Mortem: Apple Intelligence at Work

The Pre-Mortem is a weekly series on this blog. Each piece applies five questions to a major technology commitment before the outcome is known.

On 9 June 2026, Apple used its annual developer conference to announce that Siri had become something different. Not a smarter assistant. An agentic AI layer that could take actions across applications, services, and workplace workflows on behalf of its users, across a hardware ecosystem of more than 2.5 billion active devices. The world’s most valuable company had turned its operating system into an AI agent. The question the keynote did not answer was straightforward: when it gets something wrong at work, who is responsible?


The Bet

Apple is betting that privacy and accountability are the same problem. Its Private Cloud Compute architecture is genuinely novel: stateless, ephemeral, cryptographically auditable, with production builds published within 90 days for independent inspection. At WWDC 2026, Craig Federighi stated: “data is only used to execute your request, and outside experts can continue to verify this promise at any time.” The claim is that if Apple cannot read your data, no one can. What this architecture was not designed to answer is what happens when Apple Intelligence takes a workplace action on your behalf and gets it wrong. That is a different question. Apple has framed the privacy answer as if it covers both.


The Assumption

Everything turns on one distinction: that an architecture designed to prove Apple cannot access your data also constitutes a framework for enterprise accountability when AI actions produce incorrect outcomes.

It does not. Privacy means Apple is not the party reading your data. Accountability means someone is responsible for what the AI produces from it. Those are different obligations. No document currently published by Apple closes the gap between them. The existing AppleCare for Enterprise terms explicitly disclaim liability for lost profits, damage, corruption, or loss of data, or interruption of business. There is no AI-specific carve-out, no enterprise service level agreement for Apple Intelligence outputs, and no accuracy standard committed to publicly.


The Sequence

Three weeks before WWDC 2026, Apple settled a $250 million class action over Siri AI features it had promoted during the iPhone 16 launch but did not deliver. The settlement included no admission of wrongdoing. In April 2026, Apple’s CEO Tim Cook announced his departure from the role, with John Ternus, the head of hardware engineering, confirmed as his successor from September 1, 2026. Ternus had no publicly stated role in shaping Apple Intelligence. At WWDC 2026, enterprise MDM controls for Apple Intelligence were available in beta only, with general availability expected in autumn 2026. The agentic deployment was announced. The governance controls that enterprises need to deploy it responsibly were not yet generally available.


The Pager

Craig Federighi, Senior Vice President of Software Engineering, is the named face of Apple Intelligence. Amar Subramanya, Vice President of AI, is the operational lead, reporting to Federighi since the retirement of John Giannandrea earlier this year. Neither has made any public commitment regarding enterprise accountability for AI outputs. By September 2026, John Ternus will carry the CEO accountability for a deployment he did not architect, operating under governance terms that were written before agentic AI was part of the product. No named individual or governance body is publicly committed to what Apple Intelligence does in enterprise workflows when it goes wrong.

The Proof

Apple has published no enterprise outcome measure for Apple Intelligence. No accuracy benchmark, no error rate commitment, no service level agreement for business customers. The company’s transparency commitments for Private Cloud Compute are real: production code published within 90 days, a cryptographically auditable log, a virtual research environment for security testing. These are privacy verification mechanisms, not performance standards. A survey of approximately 100 enterprise IT administrators published in May 2026 found that the primary concern was data exfiltration to unmanaged providers, and that eight per cent of organisations had already moved to prohibit AI features entirely. No one at Apple has publicly committed to a measure that would settle that question.

The Verdict

Apple has done more than most technology companies to make its cloud AI architecture independently verifiable. Private Cloud Compute is a credible attempt to resolve the privacy half of the enterprise AI problem. The accountability half remains open. If Apple publishes enterprise terms that define who carries responsibility for agentic errors in business workflows, and if John Ternus names a specific accountable owner for enterprise AI governance before the full iOS 27 rollout, the MDM controls announced at WWDC 2026 become the foundation of something credible. Without both, the hundreds of millions of Apple Intelligence-enabled devices deployed into enterprise settings are operating on a privacy promise. That is not the same thing as an accountability framework.

Your AI Initiative Isn’t Failing Because of the Technology

The technology works. That is almost never the problem.

Across most large organisations right now, AI pilots are running. Proof-of-concepts are producing results that make it into board presentations. Vendor demos are impressive. The innovation team is energised. And then, somewhere between the pilot environment and actual production, the whole thing quietly stops.

According to Deloitte’s 2026 State of AI report, drawn from more than 3,200 business leaders, only 25% of organisations have moved 40% or more of their AI experiments into live production. That number deserves to sit with you for a moment. Three in four organisations are running AI experiments that have not become operational capability. The technology is not the constraint. Something else is.


You Have Seen This Before

If you have been in transformation long enough, this pattern is not new. It is the same pattern from every large ERP programme that never fully went live. Every data platform that became a reporting tool rather than a decision-making engine. Every digital transformation that delivered a new front end while leaving the back-office processes unchanged.

The technology becomes the story because it is visible, measurable, and exciting to talk about. The execution conditions that determine whether the technology actually delivers are harder to photograph and harder to put in a slide: ownership, integration, adoption. So they get managed as a substream, treated as implementation detail, and quietly become the reason the initiative stalls.

This is not an AI problem. It is an execution problem that has found a new context.


Ownership Is Not a Committee

The single most common structural failure in AI deployments is diffuse accountability. Someone owns the technology. Someone owns the data. Someone owns the security review. Someone owns the business case. Nobody owns the outcome.

Committees do not drive production deployments. They review them, adjust them, query them, and occasionally approve them. The organisations that close the gap from pilot to production consistently have a single named individual who is accountable for whether the capability lands in the hands of users, works as intended, and is actually being used. Not a steering group. Not a centre of excellence. One person with the authority and the obligation to make it happen.

This is not a preference for a particular organisational design. It is what the evidence shows, consistently, across every transformation context where the accountability question has been seriously investigated. Singular ownership is not sufficient on its own. But its absence is almost always present when a deployment fails.


The Metric You Are Probably Not Tracking

Most AI initiatives are measured on model accuracy, inference speed, and technical performance. These are valid measures of whether the technology works. They are not measures of whether the initiative is delivering value.

The question that actually determines success is adoption. Is the tool being used? By how many people? How often? Has it changed the decision they were making, or is it an additional step they complete before making the same decision they always made?

Deloitte’s 2026 data found that despite AI tools being available to approximately 60% of the workforce in organisations surveyed, fewer than 60% of those workers actually use them regularly. Access is not adoption. Availability is not value. If you do not have an adoption metric from day one, not a plan to measure adoption eventually but an actual metric that someone is accountable for, you are measuring the wrong thing and you will find out too late.


Scope Is Your Production Variable

There is a reason pilots succeed and production deployments struggle. A pilot can be run by a small team, in a controlled environment, with curated data, limited integrations, and a sponsor who is personally invested in making it work. Production is fundamentally different. It requires integration with existing systems that were not designed for this. It requires security and compliance review. It requires monitoring, maintenance, and the ability to handle the variability of real-world use at scale.

The organisations that consistently move from pilot to production do one thing differently: they scope production more narrowly than they scoped the pilot. Not because they are being unambitious, but because a narrow, fully integrated, fully adopted capability that actually works is worth ten pilots that demonstrated potential and then stalled in the transition.

Start smaller in production than you think you need to. Prove the integration. Prove the adoption. Then expand. The ambition for scale is valid. The timing of it is where most programmes get it wrong.


The Pattern Closes the Same Way Every Time

The 54% of organisations that Deloitte found expecting to move the majority of their AI experiments to production within three to six months are not describing a plan. They are describing an aspiration. The organisations that will actually close that gap are the ones that address the execution conditions, not the technology stack.

Singular accountability. Adoption as the primary metric. Scope narrowed deliberately in production. None of these are technology decisions. They are leadership decisions, and they can be made before the next pilot is commissioned.

The technology is ready. The question is whether the organisation is.

 

The Dashboard Won’t Save Your Project. Your People Will

We have built an entire industry around the wrong obsession.

Walk into any project or programme environment today and tell me what you see. Dashboards. RAG statuses. KPI scorecards. Burndown charts. Milestone trackers. Automated reports that nobody reads in full but everyone references in meetings as though they tell the complete story.

We have convinced ourselves that if we can measure it, visualise it, and put it on a screen, we are in control.

We are not in control. We are comfortable. And those are not the same thing.

Because the thing that actually determines whether your project succeeds or fails, the thing that has always determined it, is not sitting in any dashboard. It is sitting at a desk, joining a call, navigating a problem at 4pm on a Friday when the system throws an error nobody anticipated and the go-live is Monday morning.

It is your people.

And most leaders have quietly forgotten that.


How We Got Here

The shift happened gradually, and it happened with good intentions.

Technology gave us visibility we never had before. We could track progress in real time, surface risks earlier, and report upward with confidence. That was genuinely valuable. Nobody is arguing for less information.

But somewhere along the way, the tool became the answer. The dashboard became the proxy for understanding. The metric became the substitute for the conversation. And the leader who once walked the floor, read the room, and sensed the real mood of a programme started trusting the green status on a screen instead.

The result is a generation of project environments where the reporting is polished and the delivery is fragile. Where everything looks healthy until it suddenly is not. Where nobody saw it coming, except the people closest to the work, who saw it coming for weeks and had nowhere safe to say so.

That is not a data problem. That is a leadership problem.

 


What the Software Cannot Tell You

Your project management software does not know that your lead developer has been quietly updating her CV for three weeks because she feels invisible on this programme.

Your dashboard does not know that the business analyst who owns the most critical workstream is running on empty and has been covering for a colleague who disengaged two months ago.

Your RAG status does not know that the reason everything is green is because the project manager is too afraid to report amber. Because the last time someone reported amber, the steering committee treated it as a personal failure rather than useful information.

Your metrics do not know that the vendor’s implementation team has internally deprioritised your programme because a larger client demanded more of their attention, and your account manager has been managing that fact rather than disclosing it.

None of this shows up in the data. All of it will show up in the outcome.

Professor Bent Flyvbjerg’s research on major project delivery, one of the most comprehensive analyses of project outcomes conducted, found that 91.5% of major projects experience cost overruns, schedule delays, or both. The primary driver is not technical failure. It is optimism bias: the structural human tendency to underestimate problems, which reporting cultures then amplify. A team that does not feel safe surfacing bad news will report optimistically. And the gap between what is reported and what is real compounds week by week until it cannot be managed.

This is the gap that leaders who have outsourced their judgement to software cannot see. The human information. The signals that travel through relationships, not reporting lines. The early warnings that only surface when people feel safe enough, and trusted enough, to tell you the truth.


People Deliver. Not Platforms

Let me be direct about something that gets lost in every technology conversation.

The software does not write the requirements. A person does. The platform does not manage the stakeholder who keeps changing scope. A person does. The dashboard does not have the difficult conversation with the supplier who is underperforming. A person does. The metric does not hold the team together at the point when the pressure peaks and the temptation to cut corners becomes real.

A person does.

Every single meaningful act in the delivery of a project or programme is a human act. The technology supports it, documents it, and reports on it. But it does not do it.

This sounds obvious. And yet the way most organisations invest their leadership attention, their development budget, and their improvement energy tells a completely different story. They upgrade the tools before they develop the people. They add another dashboard before they ask whether their team leaders have the skills to have honest conversations. They buy new software to solve problems that are fundamentally about trust, capability, and culture.

And they wonder why the new system does not fix the delivery problem.


The People Who Confirm Success

Here is the other half of the equation that rarely gets enough attention.

It is not just the people who deliver the project that matter. It is the people who decide whether it worked.

The clinician who was supposed to use the new system and quietly reverted to the old one because nobody involved her in the design. The frontline manager who was presented with a new process in a one-hour training session and had nowhere to raise the fact that it does not reflect how the work actually happens. The customer who was told the transformation would make their experience better and is still waiting.

These people are the real success criteria. Not the go-live date. Not the project closure report. Not the benefits case that was written eighteen months before anyone understood what was actually being built.

Transformation succeeds when the people it was designed for adopt it, use it, and tell you it made a difference. And they will only do that if they were treated as participants in the process, not recipients of its output.


What Recalibration Actually Looks Like

Leaders who get this right do not look fundamentally different from the outside. They attend the same meetings. They review the same reports. But they do something that most of their peers have quietly stopped doing.

They go to where the work is.

Not to check on it. Not to apply pressure. To understand it. To ask the questions that the dashboard cannot answer. How are you actually finding this? What is slowing you down that is not on the risk register? What do you know that I should know?

Google’s Project Aristotle, an internal study of more than 180 Google teams, found that psychological safety was the single strongest predictor of team effectiveness, above individual talent, structure, and every other measurable factor. Amy Edmondson’s research at Harvard Business School reinforces this from a delivery perspective: teams where people feel safe to raise problems surface them earlier, when they are still recoverable. When people do not feel safe, the information gets filtered. And filtered information is what produces the green dashboard above the failing project.

They treat their team’s energy as a delivery asset, because it is. They notice when someone has gone quiet. They notice when the language in status reports starts becoming defensive rather than informative. They notice when the optimism of the first month has been replaced by the grinding compliance of a team that no longer believes the work matters.

And they act on what they notice. Not with a new metric. With a conversation.

They invest in the human layer of delivery the way that most organisations invest in the technical layer. Deliberately. Consistently. Not as a soft add-on to the real work, but as the foundation of it.


The Investment Gap

The question is not whether your tools are good enough.

For most organisations, the tools are fine. In many cases, the tools are excellent. The dashboards are sophisticated. The reporting is comprehensive. The project management frameworks are mature.

And yet the delivery outcomes have not improved at the rate the technology investment suggested they should. PMI’s research, tracking project performance across thousands of organisations globally, found that communication failure contributes to one in three project failures. The gap between organisations that invest seriously in the human and communication layer of delivery and those that do not is measurable, consistent, and significantly larger than most leaders assume.

The gap is not in the software. It is in the leadership attention.

What would change if you spent the same energy on understanding your people that you currently spend on reviewing your reports? What would surface if your team genuinely believed that telling you the truth was safer than protecting the status? What decisions would you make differently if you had the human information as clearly as you have the data?

Those are not rhetorical questions. They are the questions that separate the programmes that deliver from the ones that drift.


The Skill No Platform Replaces

Every programme failure I have ever been close to had warning signs that the data did not capture. The signs were there in the people. In the energy levels. In the conversations that stopped happening. In the problems that got managed rather than solved.

And in almost every case, the leaders were looking at a screen when they should have been reading a room.

The software is not the problem. The hardware is not the problem. The metrics and the dashboards are not the problem.

The problem is that we have allowed them to replace the most important leadership skill there is.

The ability to understand people. To create the conditions where they do their best work. To recognise when they are struggling before it shows up in a project status. To build the kind of trust that means the real information travels fast enough to matter.

No platform does that. No tool does that.

Only you do that.

And the projects that remember it are the ones worth talking about.

Pre-Mortem: Twenty Million Members, No Published Error Rate

The Pre-Mortem is a weekly series on this blog. Each piece applies five questions to a major technology commitment before the outcome is known.

By the end of this year, twenty million Americans will use an AI companion to check whether their treatment has been approved, understand their benefits, and find out where they stand on a coverage dispute. UnitedHealth Group, the largest health insurer in the United States, calls it Avery. What the company has not published is what happens when Avery gets it wrong, and who carries it.


The Bet

UnitedHealth Group is investing more than $1.5 billion in AI in 2026. Avery is one part of a portfolio of over one thousand AI applications now operating across its insurance, pharmacy, and healthcare delivery businesses. The company expects a two-to-one return, much of it within the next eighteen months.

The scope goes further than the navigation functions Avery handles publicly. UnitedHealth has stated its intention to embed AI across claims decisions, clinical documentation, billing code selection, and fraud detection. The bet is that AI can absorb these regulated, high-stakes workflows faster than the accountability architecture around them can be clarified.


The Assumption

The whole bet turns on this, that an AI companion helping members find their benefits is categorically different from an AI algorithm making coverage decisions.

That distinction matters to UnitedHealth and to the regulatory debate around it. It is also the exact point where the accountability gap lives. Avery’s scope includes claim approval status and benefit explanations. In the sequence of a denied treatment, those interactions are not neutral, they are the moments where a member either understands their rights or does not. The line between navigation and decision sits precisely where the product is deployed.


 

The Sequence

UnitedHealth has been here before. Between 2019 and 2022, its subsidiary naviHealth deployed an AI tool called nH Predict to manage post-acute care decisions for Medicare Advantage members. A Senate investigation found that UnitedHealth’s denial rate for post-acute care claims more than doubled after nH Predict was deployed. A federal class action, Lokken v. UnitedHealth Group, alleges that the algorithm overrode treating physicians’ recommendations and carried a 90 per cent error rate on appeal, nine of every ten denied claims reversed when challenged.

That lawsuit is still advancing. In March 2026, a federal court ordered UnitedHealth to disclose its AI denial algorithm documentation, including internal AI Review Board materials, documents related to government investigations, and business records reaching back to 2017. Avery launched the same month to 6.5 million members, with a target of 20.5 million by year-end.

The sequence matters. The error rate history of the predecessor tool is documented and in litigation. The commitment not to repeat it with Avery has not been published in measurable form.


The Pager

UnitedHealth states that Avery is governed by a responsible use policy with review and approval from its AI Review Board. That board governs model development. No published framework names which specific individual, body, or governance layer is accountable when an Avery interaction contributes to a coverage outcome that causes patient harm.

The regulatory picture does not close that gap. At least twenty-five states have issued guidance under the National Association of Insurance Commissioners (NAIC) model bulletin. Alabama, Indiana, Washington, and others have enacted specific laws requiring human sign-off on AI-assisted denials, most taking effect in 2026. But the Employee Retirement Income Security Act (ERISA) preempts state action against self-insured employer plans, which cover the majority of employer-sponsored insurance. Federal oversight through the Centers for Medicare and Medicaid Services (CMS) and the Department of Health and Human Services (HHS) covers Medicare Advantage but carries no published standard for AI liability in individual claim decisions. The accountability is distributed. No name is on it.


The Proof

The $1.5 billion figure is confirmed. No committed outcome measure has been published for Avery’s error rate, its impact on denial rates, appeal success rates under AI-assisted decisions, or any patient safety incident reporting cadence.

Per CMS disclosures filed March 2026, the first year the agency required public reporting, UnitedHealth’s prior authorisation denial rate was 16.3 per cent in 2025, 4.8 percentage points above the industry average of 11.5 per cent. The company announced in May 2026 that it will eliminate prior authorisation for 30 per cent of services by year-end. Whether that changes the AI-in-the-loop accountability question for the remaining 70 per cent has not been addressed.


The Verdict

If the governance architecture catches up, if AI Review Board accountability is mapped to individual outcomes, if state AI denial laws close the ERISA gap, and if a committed outcome framework for Avery is published and audited, then this is exactly what responsible AI deployment in healthcare should look like, a major operator taking the accountability question seriously under public and regulatory scrutiny.

Without all three, twenty million people are interacting with an AI system whose error rate is undisclosed, whose predecessor carried a 90 per cent reversal rate on appeal, and where no named human is accountable for what it tells them about their care.

The bet is bold. The architecture to carry the loss has not been built yet.

What Regulated Industries Know About Speed That Everyone Else Is Learning the Hard Way

 

There is a common assumption in business that regulation slows you down. That the organisations operating fastest are the ones least constrained by oversight. That compliance is a tax on progress.

The organisations now paying the heaviest price for AI governance failures are the ones that operated for years on exactly that assumption.

IBM’s 2025 Cost of a Data Breach Report found that 63% of organisations experiencing a material breach either had no AI governance policy or were still developing one. Shadow AI alone added an average of $670,000 to individual breach costs. The Stanford HAI AI Index recorded 233 documented harmful AI incidents in 2024, a 56% year-on-year increase. These are not primarily failures in regulated sectors. They are failures concentrated in organisations that never had to build governance infrastructure because, until recently, they never had to.

Financial services, healthcare, and government have something that fast-moving technology companies are now being forced to acquire under duress: the institutional knowledge of how to move at pace while the governance is on.


The Misconception About Constraint

Leaders who have spent most of their careers in lightly regulated environments tend to read compliance as friction. Something that adds time to a decision, introduces review cycles, and requires additional sign-off. In that framing, less compliance means faster execution.

What this framing misses is the distinction between compliance as architecture and compliance as checkpoint. A checkpoint is friction. It exists at the end of a process, adds a review stage, and slows the pipeline. Architecture is different. When governance is built into how a system is designed and how decisions are made, it does not add a stage to the process. It is the process.

The organisations in financial services and healthcare that move fastest on AI deployment are not the ones that find clever ways around their regulatory obligations. They are the ones that have built governance into their operating model, their system design, their approval authorities, and their risk frameworks so thoroughly that compliance is not a separate consideration. It is already done by the time a decision reaches an approval point.


Thirty Years of Governance Muscle

This is not an accident. Regulated industries have had decades of pressure to solve exactly this problem. A bank that cannot move fast cannot compete. A hospital that cannot adopt new clinical technology falls behind in patient outcomes and staff capability. A government department that does not modernise its systems loses efficiency and public confidence.

The answer these sectors arrived at, not by choice but by necessity, is embedded governance. Named senior owners for material deployments. Cross-functional oversight bodies with actual authority to pause or redirect, not just to advise. Pre-approved frameworks that allow decisions to be made quickly within defined boundaries, rather than requiring full escalation every time.

The results are measurable. Healthcare AI adoption in outpatient and ambulatory care doubled in two years, from 4.6% of firms in 2023 to 8.7% in 2025, within one of the most tightly regulated environments in the world, according to research published in PMC drawing on US Census Bureau Business Trends and Outlook Survey data. That pace of change did not happen despite the regulation. It happened because enough organisations in that sector had built the infrastructure to move quickly and safely at the same time. Overall healthcare AI adoption still lags sectors such as information services and professional services, where adoption exceeds 20%. The doubling reflects a strong rate of growth, not yet sector leadership in absolute terms.


What the Unregulated Sector Is Now Facing

The regulatory picture for AI is more complex than it appeared eighteen months ago, and understanding that complexity matters.

The EU AI Act has been materially reshaped. Prohibitions on unacceptable AI practices came into force in February 2025. Obligations for general-purpose AI models followed in August 2025. But an AI Omnibus legislative package, agreed in May 2026, delayed the Act’s most commercially significant provisions, those covering employment, biometrics, critical infrastructure, and education, until December 2027 at the earliest. The timeline has extended. The direction has not changed.

In the United States, the trajectory is different. The current federal administration has moved toward a consolidated national framework, explicitly designed to preempt the patchwork of state-level regulation that was developing. Colorado’s original AI Act, among the most comprehensive state-level frameworks, was replaced in May 2026 by a narrower successor focused on disclosure obligations rather than risk management requirements. The patchwork has changed shape. Any organisation planning its governance around a specific jurisdiction’s requirements may be planning around a moving target.

AuditBoard’s 2025 research found that only one in four organisations has a fully implemented AI governance programme. Among organisations with only partial AI governance guidelines, just 25% feel confident in their AI posture. Among those with mature, embedded governance frameworks, that figure rises to 48%, according to research from the Cloud Security Alliance and Google Cloud. Governance maturity is the strongest predictor of AI readiness, above deployment volume, tool selection, or the pace of regulatory change in any given jurisdiction.

The leaders with an advantage right now are not necessarily the ones tracking the latest regulatory guidance. They are the ones who understand that IBM’s breach cost data is accumulating well ahead of any enforcement regime. The external pressure may have shifted its timeline. The operational risk has not.


Governance as Competitive Advantage

The organisations that will move fastest through the current period of regulatory evolution are not the ones trying to stay ahead of each new requirement as it emerges. They are the ones building governance architecture now that will not need to be retrofitted later, whatever form external pressure eventually takes.

That means a named owner for every material AI deployment, not a committee, a person. It means oversight that has genuine authority to pause a deployment, not just to note concerns. It means pre-approved tooling and decision boundaries that allow teams to move without full escalation while still operating within defined risk tolerances.

This is not new governance theory. It is the operating model that financial services and healthcare organisations were forced to develop, iteration by iteration, under regulatory pressure. The knowledge exists. The question is whether leadership teams outside those sectors are willing to learn from it before the external pressure forces the same hard lessons.

The evidence that governance accelerates rather than inhibits deployment is not theoretical. Databricks’ State of AI Enterprise Adoption report found that financial services leads across industries in moving AI from experimental to production, reducing its ratio of experiments per production deployment from 29:1 to 10:1, the sharpest improvement of any sector measured. That is not a coincidence of timing. It is the measurable output of thirty years of building the infrastructure that makes fast deployment safe.

Speed and compliance are not opposites. In the organisations that have figured this out, they are not even in tension. Governance is the infrastructure that makes speed sustainable.

The industries that built that infrastructure under duress are now, inadvertently, the ones best positioned to show everyone else how it works.

The mechanics of building that architecture, including the five characteristics that separate real governance from the committee-and-checkpoint version most organisations have built, are covered in the companion piece Governance Is Not a Committee. It Is a Decision Architecture.

Healthcare’s Algorithm Is Working. That Is the Problem

Somewhere in American hospital records, there is a pattern that should not exist.

Diagnoses of acute posthemorrhagic anaemia, a serious blood-loss condition that requires transfusion, have risen sharply at facilities that adopted AI billing tools. Blood transfusions have not. A condition is being recorded. The standard treatment for that condition is not being given. According to a Blue Cross Blue Shield Association analysis, the discrepancy is not a rounding error. It is a signature.

This is not a story about a medical error. No patient was misdiagnosed. No physician made a wrong call. What happened is more systemic and more troubling. An AI system trained to identify billable conditions found one. It coded it. The hospital billed for it. Nobody questioned whether the diagnosis reflected care that was actually delivered.

This is what AI looks like when there is no governance around it.


What the Bill Says About the Chart

The Blue Cross Blue Shield analysis examined what happened to hospital billing after AI coding tools arrived at scale. The numbers are not ambiguous. Inpatient spending attributable to AI coding practices reached an estimated $663 million. Outpatient spending tied to the same pattern reached $1.67 billion. One facility’s case complexity rating, the metric that determines how much a hospital can charge, rose 6.7 per cent in the year after adopting an AI billing tool. The average rise at comparable facilities in the same state was 0.9 per cent.

The practice is called upcoding: coding a patient as sicker, or their treatment as more complex, than the clinical record supports. It has existed in healthcare administration for decades. What AI has done is industrialise it. According to a federal data brief from the Office of the National Coordinator for Health Information Technology, 71 per cent of US hospitals were using predictive AI by 2024. AI use for billing specifically rose 25 percentage points in a single year, from 36 per cent of hospitals in 2023 to 61 per cent in 2024. The speed of that adoption has outrun every oversight mechanism that existed to check it.

The tool is not complicated. What was built around it is the problem. AI coding tools scan patient records and flag conditions that could legitimately be billed. In the right environment, with clinical oversight and audit processes, that is a useful capability. In the environment most hospitals actually built, which is one without meaningful governance, they become a revenue maximisation engine. The algorithm does what it was trained to do. Nobody verifies whether the conditions it codes for were actually treated. The bills go out.


The Insurer’s Algorithm Has a Different Objective

At the same time hospitals are using AI to add conditions to bills, health insurers are using AI to remove approvals from treatment requests.

Prior authorisation, the process by which insurers must approve procedures before they happen, has become a primary deployment zone for AI-driven decision-making. The American Medical Association surveyed physicians and found that 61 per cent reported health plan use of AI is increasing prior authorisation denials. A US Senate Permanent Subcommittee on Investigations report found that denial rates at UnitedHealthcare, CVS, and Humana’s Medicare Advantage plans rose as each insurer increased AI deployment in its review process.

The governance picture on the insurer side is no better than on the hospital side. A January 2026 study in Health Affairs by researchers at Stanford Health Care, drawing on a survey of 93 large health insurers, found that more than one-quarter of insurers do not document the accuracy of their AI models or test them for bias, around 40 per cent have no accountability practices in place for AI tools used in prior authorisation and claims decisions, and fewer than one-quarter even tell providers when AI was involved in a determination.

The result is a healthcare system in which AI is simultaneously inflating what hospitals charge and compressing what insurers approve. Patients sit between the two. The treatment they need may be denied before it is given and billed for a complication they were never treated for.

Arizona, Maryland, Nebraska, and Texas all passed legislation in 2025 requiring human oversight before AI can be used to deny a prior authorisation request, prohibiting it as the sole basis for medical necessity determinations. From 2026, the Centers for Medicare and Medicaid Services (CMS) will require payers to provide a specific reason for every AI-assisted denial and to publish aggregate approval data. That regulatory response confirms the scale of what is happening. Legislators do not write laws against things that are not happening.


Nobody Has Had to Answer for This

The question that neither the hospital nor the insurer has been required to answer is a straightforward one: who is responsible for what the algorithm decides?

A 2025 survey of 182 US hospital leaders by Black Book Research found that only 22 per cent are confident they could produce a complete AI audit trail within 30 days if asked. Only 29 per cent have implemented and enforced policies covering AI model inventory and accountability sign-offs. Forty-one per cent identified limited vendor documentation, the model cards and drift reports that explain how a system behaves over time, as their top barrier to audit readiness. The median share of IT and quality budgets allocated to AI governance is 4.2 per cent.

These are not numbers that describe an industry taking AI risk seriously. They describe an industry that deployed the technology and deferred the governance question for later.

The procurement happened fast. The governance never followed. Across billing departments and claims operations, AI has been handed consequential authority over patient finances and care access by organisations that did not build the structures that authority demands. The tools were procured. The governance was not.


The Wrong Diagnosis

Every time this gets written about as an AI problem, the real fix gets deferred.

If the algorithm is the villain, the solution is a better algorithm. A more accurate one. A less biased one. Another procurement cycle, another vendor, another pilot. That framing lets every decision-maker who signed the purchase order, approved the deployment, and chose not to build the oversight infrastructure step back from the frame. The machine did it. The machine was wrong.

In healthcare, the machine is doing exactly what it was built to do. It finds billable codes and it finds reasons to deny claims. It operates at the scale and speed that human reviewers cannot match. And it does all of this inside organisations that did not build the governance structures, the audit processes, the accountability frameworks, or the appeals mechanisms that consequential decisions at that scale require.

The United States is where this data exists. It is not where the problem stops.

That is not an AI failure. It is an organisational one. And unlike a broken algorithm, it cannot be fixed with a software update.

 

Governance Is Not a Committee. It Is a Decision Architecture

A technology programme was delivered on time. The steering committee signed it off. The system went live on schedule and within budget. Twelve months later, usage across the organisation sat at eleven percent. The project had been a success by every measure the governance structure tracked. It had failed by the only measure that mattered.

Nobody was accountable for the eleven percent. The named owner had moved to a different role. The steering committee was dissolved at go-live. The vendor had fulfilled its contract. The organisation had built something that worked perfectly and was used by almost nobody, and no single person in the building could explain why.

That is not a delivery failure. It is a governance failure. And it is far more common than any organisation publicly admits.

 

What Governance Actually Is

Governance is one of those words that everyone uses and nobody defines. In most organisations, it has come to mean a structure: a committee, a framework document, an approval process, a risk register. Something you have rather than something you do. You have a governance framework. The governance is in place. The committee meets quarterly.

This version of governance is useless.

Governance is not a structure. It is a decision architecture. It is the infrastructure that determines how decisions are made, who makes them, what they are accountable for, and how fast the organisation can act when circumstances change.

Every organisation has a governance architecture, whether it has designed one or not. The informal version is still a governance architecture: decisions made by whoever is most senior in the room, accountability absorbed by whoever is most junior when something goes wrong, escalation triggered whenever someone is uncomfortable. It is simply a poor one. The difference between organisations that move well and organisations that stall is rarely capability. It is usually the quality of the decision infrastructure underneath the capability.

 

Governance Theatre

The most dangerous governance is the kind that looks correct from the outside.

Most large organisations have built governance that performs the appearance of oversight without the function. The risk register is meticulously maintained and never acted upon. The steering committee meets monthly and has not once paused a programme. The policy required six weeks of approval and is read by nobody after signing. The assurance review always concludes the project is on track.

This is more harmful than no governance, for one reason: it generates confidence without protection. The board believes the oversight is in place. The programme team believes the risks are managed. The organisation proceeds as if the architecture exists, while operating without it. When the failure arrives, it arrives at scale, having been invisible to every structure designed to catch it.

The question is not whether your organisation has governance. The question is whether your governance is real.

 

What Good Governance Looks Like

Good governance has five characteristics that distinguish it from the committee-and-checkpoint version most organisations have built.

The first is named ownership. Every material decision, every significant deployment, every consequential process has a single individual accountable for the outcome. Not a committee. Not a function. A person. The committee can advise. The function can review. One name sits against each thing that matters, and that person knows it and accepts it.

The second is authority that matches accountability. The most common governance failure is asking someone to be accountable for an outcome they cannot influence. If the named owner cannot pause a deployment, redirect a budget, or override a recommendation, their accountability is nominal. If you cannot identify what the accountable person can stop, you have not given them accountability. You have given them exposure.

The third is pre-agreed frameworks. Good governance does not require full escalation for every decision. It requires that boundaries are agreed in advance, so decisions within those boundaries can be made quickly, and decisions outside them trigger a defined path. The approval gate model creates queues. The framework model reserves escalation for the decisions that genuinely need it. Speed and governance are not a trade-off. They are a design choice.

The fourth is transparency of reasoning. Material decisions need a record. Not for audit purposes, but because the organisations that navigate change well are the ones where future leaders can understand not just what was decided, but why, what alternatives were considered, and what conditions would prompt a different outcome. This is not bureaucracy. It is institutional memory, and its absence is one of the most expensive losses any organisation experiences.

The fifth is a culture that supports use. The best governance architecture fails if the organisation punishes the people who use it correctly. The programme manager who escalates a risk that delays a milestone. The engineer who flags a model limitation that complicates a launch. The analyst who says the data is not fit for purpose. If those people are sidelined or not listened to, the framework is decorative. Governance is architecture and behaviour. Building the architecture without addressing the behaviour is half the work.

 

Governance Debt

There is a cost to governance failure that does not appear on any balance sheet until it is too late to address cheaply.

Every decision made without proper governance accumulates what might be called governance debt. The decision is made, the programme moves forward, the system is deployed. The cost is not visible immediately. It appears two years later, when the person who made the original choice has moved on, when nobody can explain why the architecture was designed the way it was, when the organisation needs to change a system it no longer fully understands and cannot safely modify.

Like financial debt, governance debt compounds. Small omissions early in a programme create disproportionately large costs at the point of change. The organisations that experience the most expensive transformations are rarely those that started with the hardest problems. They are those that accumulated governance debt in the early stages and discovered the interest charge when conditions changed.

 

The Speed Paradox

The dominant assumption about governance is that it slows things down. The evidence says otherwise.

Financial services is among the most heavily governed sectors in the world. It is also, by measurable data, among the fastest at moving AI from experimentation to production. Databricks’ analysis of enterprise AI adoption found that financial services improved its experimental-to-production ratio from 29:1 to 10:1 in under eighteen months, the sharpest improvement of any sector measured. The governance culture that financial services built under regulatory compulsion became, in practice, a deployment accelerant.

The reason is straightforward. When governance is architecture rather than checkpoint, when boundaries are pre-agreed and ownership is named, decisions within the framework do not require escalation. The work that in a poorly governed organisation requires a committee review happens at team level, within agreed parameters, without delay. The governance does not add a stage to the process. It is the process.

The organisations that move slowly under governance are the ones with checkpoints. The ones that move fast under governance are the ones with architecture.

 

Why AI Makes This Urgent

AI does not create governance problems. It amplifies the ones that already exist.

Every organisation deploying AI is making decisions at scale and at speed in ways that are not always visible to the people accountable for outcomes. When a model influences hiring, lending, clinical treatment, or procurement, the decision architecture governing that model matters as much as the architecture governing any senior leader. In some respects more.

Three risks are specific to AI. The first is accountability diffusion. When a decision is made by a model, who is accountable is rarely defined in practice. The model carries no accountability. The vendor carries it within narrow contractual limits. The organisation must deliberately assign it or it defaults to nobody, which is where most organisations currently sit.

The second is scale of error. A human decision-maker with a blind spot makes that error incrementally. A model with the same blind spot can make it thousands of times before the pattern is identified. The governance that catches a human error at ten instances must catch a model error at ten thousand. Most governance frameworks were not designed for that volume.

The third is the deployment and use gap. AI systems are deployed for a defined purpose in a defined context. They are then used in contexts their designers did not anticipate, by people not trained on their limitations, for decisions the governance framework never considered. Governance must follow the system into use, not stop at the deployment gate.

One additional risk is specific to the current moment. In most organisations, AI governance covers the official deployments. It has no visibility of, and no authority over, the AI already in use through personal accounts, consumer tools, and unapproved models. The governance gap that will produce the first visible failures is not in the formal AI programme. It is in the tools already running beneath the governance architecture’s line of sight.

For boards, this is a specific accountability question. Most are receiving AI updates without the frameworks to evaluate them. The question is not whether the organisation has an AI strategy. It is whether the board can answer four things: who is accountable for each material AI deployment, what authority they hold, what the escalation path looks like when something goes wrong, and whether the governance covers the AI that is actually in use rather than only the AI that was formally approved.

 

Three Questions That Will Tell You More Than Any Framework Audit

Name the person accountable for your most significant AI deployment. Not the team. Not the function. One person. If you cannot name them in under ten seconds, you do not have governance. You have the appearance of it.

When did your governance last stop something? Not delay it, not document a risk against it. Stop it. If the answer is never, your governance is not functioning as risk infrastructure. It is functioning as a record-keeping exercise.

If the three people who made your most significant programme decisions in the last two years left tomorrow, what would the organisation know about why those decisions were made? If the answer is not much, you are accumulating governance debt at a rate your future leaders will pay.

Governance is not a committee. It is not a document. It is the infrastructure through which an organisation makes consequential decisions, learns from them, and remains able to change course when it needs to.

Most organisations have not built that infrastructure. AI has not created that problem. It has simply made the cost of not solving it impossible to ignore.

Pre-Mortem: KPMG’s AI-Powered Audit

The audit opinion is the most consequential document most public companies produce. Not the annual report. Not the investor deck. The audit opinion, because it carries a named partner’s signature, and because that signature means something in law. On 9 June 2026, KPMG and Microsoft announced the deployment of Microsoft Agent 365 and Copilot across 276,000 KPMG professionals in 138 countries, including inside KPMG Clara, the firm’s global smart audit platform. Scott Flynn, KPMG’s Global Head of Audit, called it “a pivotal milestone in our AI-powered, human assured audit transformation.” The word “assured” is doing a great deal of work in that sentence.

A pre-mortem asks the same five questions, every time, applied before failure is possible rather than after. This is the fifth in the series. The first looked at vendor accountability in regulated finance. The second at clinical safety in healthcare. The third at execution accountability in defence procurement. The fourth at clinical AI infrastructure. This one looks at professional services, the sector that has built its entire business model on the premise that human expertise is the product.

 

The Bet

KPMG is betting that efficiency and accountability can coexist at this scale. That 276,000 professionals deploying AI agents, with a governance layer running underneath, will not dilute the professional accountability the audit opinion rests on. It is a reasonable bet. It is also an untested one. The commercial logic is clear: 276,000 professionals, 138 countries, and an AI-powered workflow running through KPMG Clara creates the kind of structural productivity gain that redefines the firm’s cost base, and potentially its fee model. Analysis of recent audit fee movements suggests clients are already pressing the case that AI efficiency should flow through to lower fees. The deeper bet, the one sitting beneath the headline deployment, is that “AI-powered, human-assured” constitutes a defensible operating model before any regulatory body has defined what “human-assured” actually requires in practice.

 

The Assumption

The single assumption carrying all the weight: that governing agents is the same thing as being accountable for them. Microsoft Agent 365 provides what its own documentation describes as a control plane, a centralised registry of agents with lifecycle rules, identity controls, and audit logging. That is a meaningful capability. It answers the question: how many agents do you have, and what can they touch? It does not, on its own, answer the question a claims lawyer or a regulator will eventually ask: who is accountable when the agent was visible, governed, and still wrong? KPMG’s Trusted AI framework lists ten ethical pillars, including one labelled Accountability, which calls for human oversight and responsibility to be embedded across the AI lifecycle. That is a principle-level commitment. None of the publicly available documentation specifies what happens to the partner’s signature when an AI-assisted conclusion is signed off and later found to be materially incorrect.

 

The Sequence

KPMG has deployed agents at scale before any authoritative regulatory framework specifies what AI-assisted audit evidence must look like, or how human review of AI-generated conclusions must be documented to meet existing standards. The IAASB approved a project proposal in March 2026 to revise ISA 500, Audit Evidence, to address technology use in audit, but the project is still in early research and information gathering, with no exposure draft issued and no effective date. The PCAOB has stated publicly that it is considering developing risk management guidance for audit firms using AI. Considering, not publishing. The capability is deployed. The standard that surrounds it is still being drafted.

 

The Pager

Lisa Heneghan, KPMG’s Global Chief Digital Officer, was specific about what this deployment requires: “strong foundations in governance, visibility and accountability.” That framing is responsible, and Agent 365 provides the visibility that most enterprises currently lack. The harder question is structural and specific. The audit opinion is signed by a named partner. Professional indemnity is priced around that signature. When an agent embedded in KPMG Clara surfaces a conclusion, the partner reviews it, signs the opinion, and the work later contains a material error, the liability has historically sat with the partner and the firm. What KPMG, Microsoft, and the client have not yet published is a clear allocation of responsibility for the agent’s contribution to that error. Is it a tool failure, an oversight failure, or something existing frameworks do not yet classify? The governance layer provides the audit trail. It does not specify who reads it, or what reading it is worth, when a claim is filed.

 

The Proof

The announcement commits 276,000 professionals and earns KPMG the designation of Microsoft “Frontier Firm.” Neither is a performance measure. No published metric connects this deployment to audit accuracy improvement, reduction in deficiencies, or quality outcomes. What the deployment actually demonstrates is that KPMG can deploy Agent 365 at scale and maintain visibility over its agent estate. That is a meaningful operational achievement. It is not the same as demonstrating that AI-assisted audit conclusions are more reliable than human-only ones, which is what regulators, courts, and insurers will eventually need to see. KPMG Clara’s existing framing covers adoption and workflow integration. No published figure connects it to audit opinion accuracy or deficiency rates. The proof that matters most is still outstanding.

 

Verdict

If KPMG publishes a clear framework specifying how AI-assisted audit evidence is reviewed, validated, and documented, paired with a liability position that survives regulatory scrutiny, this becomes the reference model for professional services AI at scale. The governance commitment is genuine. The scale of deployment is unmatched in the sector. Scott Flynn’s “AI-powered, human-assured” is the right aspiration. The question is whether “human-assured” describes a documented, auditable review process that a regulator will accept and an insurer will cover, or whether it is a positioning statement waiting for a definition. At 276,000 professionals across 138 countries, the audit opinion at the centre of this deployment is too consequential to leave that question open. The answer should come before the first material claim, not after.