LONDON — In a troubling development for managers who had already reserved several Q4 earnings calls for the phrase “step-change productivity,” researchers and industry analysts are increasingly suggesting that claims about AI making workers dramatically more efficient may need to be supported by evidence, measurements, and other practices historically associated with knowing things.
The latest blow to the nation’s thriving productivity-claim sector came as the Ada Lovelace Institute argued that public and private organizations should apply stronger scrutiny to assertions that AI tools are saving time, improving output, or transforming entire departments through the simple act of being purchased. The warning follows similar concerns from engineering software firm Harness, which found that corporate excitement about AI-assisted development has often sprinted several miles ahead of the metrics used to determine whether developers are actually producing better software or merely producing more software-shaped material.
According to the Ada Lovelace Institute’s findings, organizations should be careful about treating anecdotal reports, vendor demonstrations, and one employee saying “this saved me an hour” as conclusive proof that a system-wide economic revolution has occurred. This has reportedly caused alarm among senior leaders who had been under the impression that a pilot program becomes a productivity gain at the precise moment it is mentioned in a board meeting.
The public sector has also been advised to develop more robust evidence before declaring AI a solution to chronic staffing pressures, service backlogs, and the ancient governmental problem of documents existing. A paper cited by Civil Service World said government AI productivity claims require “more robust evidence,” a phrase expected to be quietly rewritten by several departments as “strategic momentum.”
This column agrees. The AI industry’s productivity debate has reached the point where every organization can produce two numbers: the percentage improvement claimed by the vendor, and the number of people in finance who can explain how that figure was calculated. The first is usually large. The second is usually someone named Claire, who has not been invited to the transformation offsite.
None of this means AI is useless. It means that, like every major workplace technology before it, AI appears to perform best when surrounded by competent humans, clear processes, high-quality data, and managers willing to distinguish between automation and the rapid creation of future cleanup work. This is an unfortunate finding for those hoping artificial intelligence would eliminate the need for institutional knowledge by replacing it with a chatbot that remembers a policy incorrectly but in an encouraging tone.
The engineering world is discovering a particularly sharp version of this problem. AI coding tools can generate code quickly, which is helpful if the objective is to have code. If the objective is to have secure, maintainable, correctly architected software that does not quietly set fire to a billing system six months later, the discussion becomes more complicated. Harness’ warning that productivity claims are outrunning engineering metrics should not surprise anyone who has watched a team celebrate merged pull requests while defect rates, review burden, and developer attention quietly climb into the ceiling tiles.
Meanwhile, the broader technology sector continues to provide helpful reminders that scale and seriousness are not always inversely related to how ridiculous something sounds. Reports that SpaceX and xAI may merge into a very silly-sounding conglomerate are being treated, correctly, as potentially significant rather than dismissed on the grounds that the corporate structure resembles a child naming a moon base. The lesson is clear: absurdity is no longer a reliable indicator that something is not important. It is merely the house style of the economy.
The correct response is not cynicism, but accounting. If AI saves time, measure whose time. If it improves quality, define quality before the press release. If it reduces cost, check whether the cost has been moved into review, compliance, rework, customer support, or one exhausted domain expert who now spends afternoons correcting a machine that has learned to apologize.
For now, AI productivity remains entirely plausible, frequently useful, and wildly over-certified by people with quarterly targets. The technology may yet transform work. But until organizations can prove the gains, they should resist confusing the arrival of a tool with the arrival of a result.