What Happens When AI Gets It Wrong? The Hidden Risks of Hallucinations and Prompt Bias in Professional Services

Startup Accounting

What Happens When AI Gets It Wrong? The Hidden Risks of Hallucinations and Prompt Bias in Professional Services

Sasha Orloff

12.16.25

In article:

I just got back from Digital CPA, a conference for the most forward thinking accounting firms and technology solutions in the market. 2026 will be the year AI went from interesting to implemented, because now there is reasoning capabilities, corporate budget and growing interest.

But it has been a slow start for accounting firms and AI adoption against some other industries. And this makes sense, an accountant’s livelihood and reputation depends on the accuracy of financial statements. It’s what they are trained in, and what they are paid to do. So how do we reconcile that against AI hallucinations?

On my flight home I was listening to an episode of the 20MinVC where they were talking about Harvey, a leading AI legal startup who recently raised $160M (according to the podcast with $150M ARR, growing 300% YoY, 98% GDR, 168% NDR) and what else do you do on a middle seat of a cross country flight home? Exactly, open up deep research and dive down the rabbit hole of legal AI, to see if there were some interesting parallels between Harvey, Ironclad and many new YC companies such as Legora, Spellbook, Crimson, Blueshoe, and more. I tried to cite sources when possible, but understand the market is changing fast, and studies themselves could have bias based upon funders, institutions and writers. So take this as a starting point, rather than an exhaustive study.

Artificial Intelligence is unquestionably transforming how professionals' industries work. From automating mundane tasks to summarizing complex documents or scenarios, AI seems like a superpower at our fingertips. But this new capability comes with a double-edged sword: the very factors that make AI useful, such as its pattern recognition and generative language, also make it dangerously prone to hallucinations and bias. In high-stakes domains like law, these limitations aren’t just technical bugs, they can lead to incorrect legal conclusions with real consequences.

This is a challenge for builders in the space, balancing opinionated vs flexible solutions in complicated industries where users might not be as experienced in hallucinations and prompting as say engineers for code generation or sales departments for prospecting.

In my post, I wanted to explore two major challenges:

AI hallucinations: where AI confidently generates false or fabricated information
Prompt bias: where AI reflects or reinforces our assumptions, leading us to the wrong conclusion

We’ll also draw on research to illustrate why these problems matter and how to mitigate them.

‍

AI Hallucinations: Confidently Wrong Answers

One of the most well-documented issues in generative AI is hallucination: the phenomenon where an AI system produces information that sounds plausible but is inaccurate, fabricated, or misleading.

Hallucinations in Legal AI Are Real (and Frequent)

In a study by Stanford’s Human-Centered Artificial Intelligence (HAI) initiative, researchers benchmarked several legal AI tools and found that they hallucinated in at least 1 out of every 6 queries — even when the queries weren’t ambiguous or opinion-based. In other words, roughly 17% (or more) of the time, the models produced incorrect legal information or cited incorrect or made-up sources.

The study specifically targeted AI tools designed for legal research from major providers. Despite being positioned as specialized tools to assist lawyers, the outputs still contained a significant rate of misinformation, from incorrect answers to wrongful citations.

This isn’t just an academic concern: there are reported instances in court proceedings where attorneys (or litigants) submitted AI-generated legal citations that didn’t exist, which judges have sanctioned or criticized.

Now imagine for the non-experts who rely on AI for expert legal, accounting or tax advice? It can provide the illusion of certainty, which is dangerous for the average person who does not understand concepts of precedent or jurisdiction in legal, or concepts like reconciliation in accounting.

Why Hallucinations Happen

AI models are trained to find patterns in massive text datasets and then generate plausible continuations, they don’t verify truth. This means they can combine facts incorrectly, invent sources, or incorrectly “remember” information from training data.

From a business perspective, this highlights a fundamental truth: AI is impressive at sounding confident but it doesn’t “know” what’s true. Without human verification, the information produced can be misleading or outright wrong.

Prompt Bias: How We Shape AI’s Answers

Another risk is less about AI errors and more about how those errors are encouraged by the way we ask questions.

What Is Prompt Bias?

Prompt bias happens when we frame a query because of the assumptions embedded in the question, which end up nudging the AI toward a specific (potentially biased) conclusion. It’s a bit like asking a biased interview question: the answer will reflect the question, not the reality.

This aligns with a broader understanding of bias in AI, where systems reflect not only patterns in the data but also the assumptions baked into that data and into the prompts themselves. According to Chapman University’s AI Hub, AI systems internalize implicit and explicit biases from both training data and human interaction, which can manifest as misleading or unfair outputs.

How Prompt Bias Skews Conclusions

For example, a lawyer might ask an AI ““Explain why our interpretation of statute X is correct”. This kind of framing invites the AI to support a narrative rather than objectively analyze the statute. Because the model doesn’t reason like a human but instead predicts the most statistically likely continuation of your input, it will often lean into the assumptions in the prompt, thereby reinforcing a potentially incorrect or one-sided interpretation.

This is the danger is once someone thinks they’ve got the answer, they may stop critically examining the assumptions that led there.

Bias Isn’t Just a “Data Problem.” It’s a Narrative One

Bias emerges from many places:

Training data: If the source materials reflect historical inequalities or perspectives, the AI learns them.
Prompt structure: Leading language encourages the model to “agree” with your assumptions.
Confirmation loop in users: People may iterate on queries until the model produces the conclusion they want, which is a human-in-the-loop bias.

All of these can lead to unsafe or incorrect legal reasoning if unchecked.

So What Should Professional Services Teams Do?

AI tools can be extraordinarily helpful for drafting, summarizing, and surfacing insights. But they should never be the final authority on high-stakes decisions like legal analysis.

Here are some best practices:

✔ Always Validate with Experts (Lawyers, CPAs, etc).

No AI output, whether accounting, tax or legal reasoning, should be accepted without expert review. Even tools designed for law can be confidently wrong.

✔ Ask Neutral, Fact-Focused Questions

Avoid framing questions in a way that assumes a conclusion. Instead of “Tell me how this supports our case,” use “What are the relevant precedents on this issue?”

✔ Use Structured Verification

For citations and case law, cross-check with authoritative databases. Consider retrieval tools that tie back to original sources and include traceability.

✔ Teach Teams About Bias

Understanding prompt bias and data bias enables teams to interact with AI more critically rather than passively.

The Broader Lesson: Fixing Professional AI Isn’t Just a Tech Problem

The topic of how to build provably accurate AI for accountants highlights a critical point: we need AI systems that don’t just sound right, but can be shown to be right within a defined context. That’s especially true in high-integrity domains like law and finance.

At Puzzle we are encouraged by recent AI developments around reasoning models, and orchestration layers, but think the real magic comes in the user experience that is auditable, traceable, and controllable, not one shot prompts and answers. The risks are too high.

As we move forward, combining technical rigor, human oversight, and careful prompt design will be essential to harness AI safely and responsibly. I am encouraged and I am encouraged as more accounting firms catch the AI wave.

note: models and model providers are changing fast. We are seeing massive funding rounds into legal startups like Harvey who are posting massive revenue growth and net dollar retention, so they clearly seem to be onto something. This is less of a judgement on startups, and more of helping highlight some of the challenges to building companies in serious industries, and balancing opinionated versus flexible systems.

Want to learn more? puzzle.io/for-accounting-firms