Let’s begin with the headline you never want your users to read:
Deloitte Australia is partially refunding a AUD $440,000 contract because a 237-page report it delivered to the Australian government contained fabricated references and false citations.
The report was commissioned by the Department of Employment and Workplace Relations (DEWR) to review the welfare administration (“Targeted Compliance Framework”).
After a University of Sydney academic, Dr. Christopher Rudge, publicized the errors, Deloitte corrected the report, disclosed that generative AI (Azure OpenAI’s GPT-4o) had been used in parts of the analysis, and agreed to repay part of the payment.
Senator Deborah O’Neill criticized the outcome, calling it a “human intelligence problem”—that is, the oversight failure, rather than the tool itself.
The Deloitte scandal doesn’t reveal a new bug in AI. Rather, it surfaces old failures of process, governance, and human oversight, in new form. Here are lessons that map directly into how you should think about your UX Trust Audit framework:
Modern language models are designed to produce coherent, polished text. That fluency can mask errors. OpenAI describes “hallucinations” as confident but incorrect outputs where the model guesses when it lacks certainty.
AI systems don’t yet reliably flag uncertainty. The more readable something is, the more likely humans will not subject it to scrutiny.
Hallucinations often take specific forms—e.g. fake legal citations, nonexistent academics, or plausible-sounding but false sources.
Moreover, research shows hallucination “snowballing” behavior: once a model commits to an incorrect statement, subsequent justifications can compound the error.
Deloitte only disclosed its use of generative AI after revising the report and being publicly corrected.
Trust-oriented design requires transparency upfront, so users know when and where AI plays a role — not as an apology later.
AI can assist, but it doesn’t “own” its outputs. In Deloitte’s case, the people overseeing the work failed to catch glaring mistakes. Senator O’Neill’s critique underlines that this is a human failure.
In high-stakes workflows, responsibility must be explicitly assigned — not diffused across teams.
This is a UX problem as much as a technical one. If users (or clients) discover errors, they lose confidence — even if those errors are “only citations.” The Deloitte case shows how a small factual break can cascade into reputational damage.
If your UX Trust Audit is focused on mapping trust touchpoints, here’s how Deloitte’s misstep illustrates key audit vulnerabilities:
These aren’t hypothetical; they mirror where we regularly find gaps in real-world AI UX systems.
If your AI feature delivers something confidently — but it’s wrong — who is the first human to catch it?
If the answer is “the user,” your UX is already operating in the danger zone.
Embed that question in your design process. Use it in reviews. Let it guide your audit priorities.