Catching AI hallucinations: why AI makes things up

A colleague asks a chatbot for case law to support an objection letter. The answer contains three rulings, complete with case numbers and dates. Well written, exactly what was needed. One problem: two of the three rulings do not exist. This phenomenon is called hallucination: an AI system produces information that looks convincing but is made up. It is not a rare malfunction — it is a direct consequence of how the technology works, and therefore something to plan for.

Why does AI make things up?

A language model predicts the most plausible next word, over and over, based on patterns from its training data. It does not consult a database of facts, and it has no mechanism that separates truth from invention.

For a language model, a good answer is an answer that looks good. Ask it for case law, and the model knows perfectly what case law should look like — the structure of a case number, the name of a court, a plausible date. Where real knowledge is missing, the pattern fills itself in: the model generates something that fits the template seamlessly. Not because it is “lying” — the model has no intent — but because producing plausible text is simply what it does, even when the factual basis is absent.

What makes this genuinely dangerous: a hallucination reads no differently to you than a correct answer. Same fluent tone, same confidence, same tidy structure. You cannot spot fabricated output by its form — only by its content, through verification.

When is the risk highest?

Hallucinations are not evenly distributed. The risk peaks in recognisable situations:

Specific, verifiable facts. Names, figures, dates, prices, phone numbers, article numbers of laws. The more precise the requested fact, the easier it goes wrong.
Sources and quotes. Requests like “give me sources” or “quote the study” are notorious: the model effortlessly generates titles, authors and publications that do not exist, or attaches real authors to invented work.
Niche topics. Your local regulation, small supplier or internal procedure barely appeared in the training data. Thin data means more guesswork.
Recent events. The model only knows its training period. Without a live search feature, questions about current affairs return outdated or filled-in information.
Long documents and long conversations. When summarising large documents, the model can “supplement” passages that are not there, and in long chats earlier context drops out of view.
Leading questions. Ask “why is X better than Y?” and you will get arguments — even when X is not better at all. The model rarely challenges your assumption unprompted.

Conversely, the risk is lowest for tasks where you supply the source material: rewriting, structuring, summarising text you provide. Then the model does not need to dig anything out of its own “knowledge”.

Rule of thumb: the more an answer relies on facts the model has to supply itself — and the more specific, recent or rare those facts are — the higher the chance of hallucinations. Plan your verification accordingly.

Verification workflows for the workplace

Verification does not have to take hours, as long as it is targeted. Three workflows, in increasing weight.

Workflow 1: the quick check (low risk, internal use)

For draft texts, brainstorms and internal emails with no decisions attached:

Read the output in full — read, not scan.
Mark every concrete claim: every figure, name, date, reference.
For each mark, ask yourself: do I know this is correct? If not, remove it or check it.
Doubtful claim and no time to check? Write around it or make it generic.

Workflow 2: source verification (anything that leaves the building)

For texts going to customers, citizens, patients or publication:

Look up every source manually. Does the publication exist? Is the author right? Does the source actually say what the AI claims? A source you cannot find should be treated as non-existent.
Trace every quote back to the original. AI regularly paraphrases and presents it as a verbatim quote.
Trace figures to the original source, not to a website that also got the number from somewhere.
Legal and medical claims should always be verified against the official source (EUR-Lex, official government portals, clinical guideline databases) or a knowledgeable colleague.

Workflow 3: the four-eyes principle (decisions about people and money)

If AI output feeds into a decision — a quote, an assessment, advice to a client — self-checking is not enough:

Have a colleague with subject-matter knowledge review the output, with the explicit question: what here is wrong?
Document what was AI-generated and what was verified. That is not bureaucracy: for high-risk AI systems the AI Act expects human oversight, and beyond that you want to be able to show you worked carefully.
Agree within the team which types of output always follow this route.

Smart habits that prevent hallucinations

Besides checking afterwards, you can reduce the risk up front:

Supply source material. “Summarise this document” is safer than “what do you know about this topic?”.
Offer a way out. Add: “If you are not sure about something, say so explicitly.” Not a guarantee, but an improvement.
Ask for uncertainties. “Which claims in your answer should I verify?” often yields a usable checklist.
Use tools that cite sources where possible — and actually click those sources, because even a neat citation can be summarised incorrectly.
Know your own trap. The biggest risk is not the AI, but time pressure plus an answer that looks good. Precisely when the answer is exactly what you hoped for, checking matters most.

Responsibility stays with the human

Whoever puts an invented case number in a formal letter cannot point at the chatbot. You sign it; your organisation is responsible. That is also the spirit of the AI Act: AI may assist, but people must stay in control — and for that, they need to know where the technology fails. This is exactly why recognising and catching hallucinations belongs to the core of AI literacy, and to what Article 4 asks of staff. Our page for employers covers how to organise this across a team.

Want to test how well you spot fabricated output? Take the free quiz. In our AI literacy course you practise these verification workflows with real-world examples.