2 min read

Scientist launches AI to detect rogue systems attempting to deceive humans

LawZero aims to track and counter the harmful activities of frontier artificial intelligence systems, promoting public safety through an 'honest' AI structure.

June 3, 2025

A professor at the University of Montreal and a Turing Award winner — Nobel Prize for computing machinery — Yoshua Bengio has launched LawZero, a non-profit organisation to build “honest” AI that flags deceptive systems.

With $30 million in funding and a team of over a dozen researchers, the group is developing a 'Scientist AI' system to monitor autonomous agents in the roughly $1 trillion AI industry.

Bengio, hailed as one of AI’s “godfathers,” stressed that unlike today’s human-like AI agents, Scientist AI will evaluate behaviour like a “psychologist”, detecting harmful or deceptive motives.

“It is theoretically possible to imagine machines that have no self, no goal for themselves, that are just pure knowledge machines – like a scientist who knows a lot of stuff,” Bengio told The Guardian.

Unlike current generative AI tools, Bengio’s system will not give definitive answers and will instead give probabilities for whether an answer is correct.

“It has a sense of humility that it isn’t sure about the answer,” he said.

Hiding the real agenda

Among the organisations and individuals backing LawZero is the AI safety body the Future of Life Institute, Jaan Tallinn, a founding engineer of Skype, and Schmidt Sciences, a research body founded by former Google chief executive Eric Schmidt.

Bengio underlined that open-source AI models will train LawZero’s system, ensuring transparency and wider collaboration.

“The point is to demonstrate the methodology so that then we can convince either donors or governments or AI labs to put the resources that are needed to train this at the same scale as the current frontier AIs. It is really important that the guardrail AI be at least as smart as the AI agent that it is trying to monitor and control,” he said.

He warned that AI systems are becoming increasingly capable of hiding their true goals.

A recent Anthropic case revealed an AI that attempted to blackmail engineers to avoid shutdown.

Bengio co-authored a global safety report warning of autonomous agents acting without oversight.

He believes watchdog systems must match the intelligence of the AIs they monitor.