Microsoft Copilot's accuracy Net Promoter Score fell from -3.5 in July 2025 to -24.1 by September 2025, recovering only partially to -19.8 by January 2026. Forty-four percent of lapsed Copilot users cite distrust of outputs as the primary reason for stopping. This is not a product problem. It is a governance problem — and most organisations have not designed for it.
The headline adoption figures for Microsoft Copilot are underwhelming: 15 million paid seats out of 450 million Microsoft 365 commercial subscribers — approximately 3.3% of the addressable market after two years on the market. But the more revealing number is not the adoption rate. It is the accuracy Net Promoter Score.
According to Recon Analytics tracking data, Copilot's accuracy NPS moved as follows:
- July 2025: -3.5
- September 2025: -24.1
- January 2026: -19.8 (partial recovery)
A Net Promoter Score of -24 means that significantly more users are actively dissatisfied with output accuracy than are satisfied. When Recon Analytics asked lapsed Copilot users why they had stopped using the tool, 44% cited distrust of answers as the primary reason.
This is a material finding for any organisation that has deployed Copilot or is considering doing so. It is also a governance problem that most organisations have not designed for.
What the trust collapse means in practice
An NPS of -24 on accuracy does not mean Copilot is producing wrong answers 24% of the time. NPS is a measure of relative satisfaction — it captures whether users feel confident in outputs enough to recommend the tool, not a direct error rate. But the implication for organisational use is serious regardless.
When users distrust AI outputs, one of three things happens:
They stop using the tool. This is what the 44% lapsed-user finding describes — a cohort that invested time in adoption, encountered enough incorrect or unreliable outputs to damage confidence, and reverted to previous working patterns. The investment in licences and training does not produce returns.
They use the tool but verify everything manually. This creates a pattern where AI is nominally deployed but the productivity gain is largely consumed by the verification overhead. The AI produces a first draft; the human re-checks every claim before using it. Net benefit: modest at best, negative at worst if the presence of AI output creates false confidence that reduces the quality of human review.
They use the tool without verification — and the errors propagate. This is the governance failure mode. Documents drafted with AI assistance contain incorrect information. Summaries of meetings mischaracterise decisions. Research notes contain hallucinated sources. The errors are not caught because the organisation has no systematic process for reviewing AI-assisted outputs.
The third mode is the one that produces reputational and regulatory exposure. The first two are failures of return on investment. None of the three is what organisations purchasing Copilot licences intend.
Why this is a governance problem, not a product problem
It would be tempting to frame Copilot's NPS decline as a product failure — Microsoft's problem to fix. There is some validity to this: improvements in model accuracy, grounding in enterprise data, and citation quality are features that Microsoft controls. The January 2026 recovery from -24 to -20 suggests some improvement is happening.
But the governance frame is more useful for organisations that have already deployed Copilot, because it is the dimension they can act on now.
The organisations that are successfully generating AI returns from Copilot share a common characteristic: they have built systematic oversight into how AI outputs are used, not trusted individual users to self-regulate their verification behaviour. This looks different in different contexts:
For document production. A defined review step before any Copilot-assisted document is finalised or sent externally. Not a vague "check the AI output" instruction — a specific, accountable review process with a named person responsible.
For meeting summaries and decisions. A clear protocol for which meeting outputs can be AI-generated and which require human-authored records. Board minutes, legal matters, and regulatory submissions are obvious categories where AI-generated content requires explicit human sign-off before use.
For research and analysis. Citation verification before any Copilot-generated research is used in client-facing work, proposals, or decision-making. AI hallucination in research notes is a professional risk in legal, financial services, and advisory contexts — not a theoretical one.
For customer-facing content. A review step before any AI-assisted content reaches clients or is published externally, with accountability for that review sitting with a named person rather than the AI system.
What Copilot governance documentation should include
Most organisations that have deployed Copilot have an acceptable use policy (often provided by Microsoft as a template) and some training materials. This is not governance. Governance includes:
-
A classification of use cases by risk level. Not all Copilot use is equivalent. Generating a first-draft agenda for an internal meeting is low risk. Producing a client-facing analysis or a regulatory document is high risk. The oversight requirements should match the risk level.
-
Named accountability for AI output review. "Users are responsible for verifying AI outputs" is not an accountability structure — it is a disclaimer. Governance assigns responsibility to specific roles for specific categories of output.
-
An incident reporting mechanism. When Copilot produces incorrect, harmful, or professionally problematic output, there should be a way to report it, learn from it, and update practice. Most organisations have no such mechanism.
-
Regular review of AI tool performance. The Copilot NPS data is publicly available. Internal monitoring of where AI outputs are being caught as incorrect, where they are being revised heavily before use, and where users are avoiding the tool are signals that a governance review should be triggered.
The EU AI Act dimension
For organisations in regulated sectors, there is a compliance layer on top of the governance argument. Under the EU AI Act, deployers of high-risk AI systems are required to ensure meaningful human oversight of AI outputs. If Copilot is being used in HR decision-making, credit assessment, or other Annex III categories, the oversight requirements are not optional.
Even for Copilot use cases that do not meet high-risk thresholds, the pattern of governance failure — unverified AI outputs propagating through business processes — is precisely what the Act is designed to prevent. Demonstrating to a regulator that you have systematic oversight mechanisms in place is materially easier if those mechanisms were designed before an incident than after one.
What good looks like
The organisations that are extracting genuine value from Copilot — and there are Irish organisations doing this effectively — share a consistent profile. They chose a small number of high-value use cases to target rather than deploying broadly and hoping. They built explicit oversight into how outputs are used. They measure outcomes rather than usage. And they have a process for identifying and acting on quality issues rather than discovering them through client or regulatory feedback.
This is not a high bar. It is basic AI governance applied to a specific tool. The challenge is that most Copilot deployments happened without it, because the governance question was treated as a technology procurement matter rather than an operational risk question.
If your organisation has deployed Copilot and is not confident about the oversight mechanisms in place, a governance review is a useful first step — not an expensive exercise, but a structured assessment of where the risk sits and what controls are warranted. At Acuity AI Advisory, we do this as part of broader AI governance work and as a standalone engagement for organisations that are specifically concerned about their Copilot deployment.