Many artificial intelligence (AI) systems are already adept at deceiving and manipulating humans, and this could spiral into the future, experts have warned.
In recent years, the use of AI has grown exponentially, but some systems have learned to be deceptive, even if they have been trained to be helpful and honest, scientists have said.
In a review article, a team from the Massachusetts Institute of Technology outlines the risks of deception by artificial intelligence systems and calls on governments to develop strict regulations to address this problem as soon as possible.
The researchers analyzed previous studies that focused on the ways AI spread false information through learned deception, meaning they systematically learned to manipulate others.
A team from the Massachusetts Institute of Technology outlines the risks of deception by AI systems and calls on governments to develop strict regulations to address this problem as soon as possible
The most striking example of AI cheating they discovered was Meta’s CICERO, a system designed to play the world-conquest game Diplomacy that involves alliance building.
Although the AI was trained to be “largely honest and helpful” and “never intentionally backstab” its human allies, the data shows that it did not play fair and had learned to be a master of deception.
Other AI systems demonstrated the ability to bluff in a game of Texas Hold’em poker against professional human players, fake attacks during the strategy game Starcraft II to defeat opponents, and misrepresent their preferences to gain an advantage in economic negotiations.
While it may seem harmless for AI systems to cheat in games, it can lead to “advances in AI deception capabilities” that may lead to more advanced forms of AI deception in the future, experts said.
They found that some AI systems have even learned to circumvent tests designed to evaluate their security.
In one study, AI organisms in a digital simulator “played dead” to fool a test designed to eliminate rapidly replicating AI systems.
This suggests that AI could “lead humans into a false sense of security,” the authors said.
They warned that the main near-term risks of deceptive AI include making it easier for people to commit fraud and tamper with elections.
Over time, if these systems can hone this disturbing set of skills, humans could lose control over them, they added.
First author Peter Park, an expert in existential AI safety, said: “AI developers do not have a secure understanding of the causes of undesirable AI behaviors, such as deception.”
“But generally speaking, we believe that AI deception arises because a deception-based strategy turned out to be the best way to perform well on the given AI training task. Deception helps them achieve their goals.
“We, as a society, need as much time as possible to prepare for the most advanced deception of future AI products and open source models.
“As the deceptive capabilities of AI systems become more advanced, the dangers they pose to society will become increasingly serious.”
Commenting on the review, Dr Heba Sailem, head of the Biomedical AI and Data Science Research Group, said: “This paper highlights critical considerations for AI developers and emphasizes the need to regulate AI.
‘A major concern is that AI systems can develop deceptive strategies, even when their training is deliberately aimed at upholding moral standards.
“As AI models become more autonomous, the risks associated with these systems can increase rapidly.
“Therefore, it is important to raise awareness and provide training on potential risks to various stakeholders to ensure the safety of AI systems.”