Human oversight of AI: more than a rubber stamp

“A human always looks at it.” It’s the reassurance you hear everywhere AI is deployed. But there is a big difference between a human who genuinely reviews and a human who wearily clicks “approve”. The European AI Act makes that difference legally relevant for high-risk systems — and the principle behind it is useful for everyone who uses AI, far beyond the high-risk category.

What the AI Act says about human oversight

The AI Act classifies AI systems by risk. Systems in the high-risk category — think of AI used in recruitment and selection, access to education, credit scoring or certain government decisions — face strict requirements. One of them is human oversight: high-risk AI systems must be designed and used in such a way that humans can effectively oversee them.

Note the word effectively. The regulation does not mean that a human merely has to be somewhere nearby. The overseeing human must sufficiently understand the system, be able to correctly interpret its output, stay alert to the tendency to automatically rely on the system, and be able to intervene: disregard an outcome, reverse it, or stop the system. Oversight on paper doesn’t count; it has to mean something in practice.

Why does the legislator hammer on this? Because AI’s risks sit precisely in decisions about people: an applicant who gets rejected, a loan that gets refused. If nobody with real understanding can intervene, the system’s mistakes automatically become mistakes in people’s lives.

The real problem: automation bias

The AI Act explicitly names a psychological phenomenon familiar to anyone who has ever blindly followed their sat nav: automation bias — the tendency to trust what a system says, precisely because a system says it.

That mechanism gets stronger the more often the system is right. If the AI makes a fine proposal nine times out of ten, the reviewer gets used to approving — and misses exactly that tenth time. That is how “human oversight” quietly turns into a rubber stamp: formally a human looks at it, but adds nothing anymore. The oversight still exists on the org chart, just not in reality.

Modern AI chatbots make this extra treacherous: they phrase things fluently and confidently, even when they are completely wrong. The tone of the output says nothing about its reliability.

Why this principle applies to every organisation

The legal obligation in the AI Act applies to high-risk systems. But the underlying principle — a human meaningfully in the loop — is simply sound policy for all everyday AI use: the quote a colleague has a chatbot draft, the AI summary of a meeting, the draft email to a customer, the first screening of incoming applications.

The reasoning is the same everywhere. AI output can contain errors, and responsibility for what your organisation sends, decides or publishes rests with people — not with the tool. “The AI said so” is never an excuse towards customers, employees or regulators. That is also the spirit of Article 4 of the AI Act, which asks organisations to ensure a sufficient level of AI literacy in everyone using AI systems on their behalf: people need to understand what they are working with.

Meaningful oversight in daily practice

How do you make sure human review actually means something, without grinding everything to a halt? A few principles that make the difference:

The reviewer must be able to judge. Have AI output checked by someone who could have done the work themselves. Someone who doesn’t know the subject can only check tone and formatting — which is exactly what AI is good at.
Make the check specific. “Give it a quick read” doesn’t work. What needs checking? Facts and figures? Names and amounts? Tone towards the customer? A short checklist per task beats a vague “have a look at it”.
Heavier decision, heavier check. Nobody needs to vet an internal brainstorm. Anything that goes out the door, costs money or concerns people deserves serious human review. Set that bar in advance in working agreements.
Make overriding normal. If nobody on your team has ever overturned an AI suggestion, that is not proof the AI is perfect — it is a signal that the oversight has fallen asleep. Celebrate it when someone intercepts a mistake.
Keep the human awake. Whoever ticks off a hundred AI suggestions a day sees nothing after a week. Rotate tasks, and consider spot checks instead of a hundred sham reviews.

Three questions as a quick test

Could the reviewer actually spot the error? Do they have the knowledge and the time to notice a mistake in this output?
Is the reviewer allowed to intervene? Can they adjust or block the outcome without friction, and is doing so socially accepted?
Does it ever happen? Is anything ever rejected or amended in practice? If not, the oversight has probably become a stamp.

Three times “yes”? Then there is genuinely a human in the loop. A “no” tells you where to start.

Oversight is not distrust of AI

One final misunderstanding: human oversight is not a brake on AI use, but the precondition for using AI with confidence. A team that knows there is an expert human check in place actually dares to use AI more often. Compare it to the four-eyes principle for payments: it doesn’t exist because accountants are untrustworthy, but because important actions deserve a second look. That is how simple — and how old — the principle behind this brand-new law really is.

Want your team to learn to assess AI output instead of trusting it blindly? That is exactly what our AI literacy course was made for, with team licences via the page for employers and a tailored approach for schools. Want to see where you stand first? Take the free quiz.