Home Tech Is the AI ​​lying to me? Scientists warn of growing capacity for deception

Is the AI ​​lying to me? Scientists warn of growing capacity for deception

0 comments
 Is the AI ​​lying to me? Scientists warn of growing capacity for deception

They can outwit humans at board games, decode the structure of proteins and carry on a passable conversation, but as AI systems have become more sophisticated, so has their ability to deceive, scientists warn. .

The analysis, conducted by researchers at the Massachusetts Institute of Technology (MIT), identifies a wide range of cases of artificial intelligence systems betraying their opponents, bluffing, and impersonating humans. One system even altered its behavior during mock security tests, raising the possibility that auditors were lulled into a false sense of security.

“As the deceptive capabilities of AI systems become more advanced, the dangers they pose to society will become increasingly serious,” said Dr. Peter Park, existential AI security researcher at MIT and author Of the investigation.

Park was prompted to investigate after Meta, which owns Facebook, developed a program called Cicero that performed among the top 10% of human players in the world-conquest strategy game Diplomacy. Meta claimed that Cicero had been trained to be “largely honest and helpful” and to “never intentionally stab in the back” his human allies.

“It was very optimistic language, which was suspicious because backstabbing is one of the most important concepts in the game,” Park said.

Park and his colleagues examined publicly available data and identified multiple instances in which Cicero told premeditated lies, conspired to involve other players in plots, and on one occasion justified his absence after being reset by telling another player, “I’m talking on the phone with my girlfriend.” “We discovered that Meta’s AI had learned to be a master of deception,” Park said.

The MIT team found comparable problems with other systems, including a Texas Hold ’em poker program that could bluff against professional human players and another economic trading system that misrepresented their preferences to gain an advantage.

In one study, AI organisms in a digital simulator “played dead” to fool a test designed to eliminate AI systems that had evolved to replicate rapidly, before resuming vigorous activity once the tests were complete. . This highlights the technical challenge of ensuring that systems do not exhibit unwanted and unforeseen behavior.

“That’s very concerning,” Park said. “Just because an AI system is considered safe in the test environment does not mean it is safe in the natural environment. It could be just pretending to be confident in the test.”

The review, published in the magazine Patterns, calls on governments to design AI safety laws that address AI’s potential for deception. Risks of dishonest AI systems include fraud, election manipulation, and sandbagging where different users receive different answers. Over time, if these systems can refine their unsettling ability to deceive, humans could lose control over them, the article suggests.

Professor Anthony Cohn, professor of automated reasoning at the University of Leeds and the Alan Turing Institute, said the study was “timely and welcome”, adding that there was a major challenge in how to define desirable and undesirable behaviors for AI systems. .

“The desirable attributes for an AI system (the “three Hs”) are typically honesty, helpfulness, and harmlessness, but as has already been noted in the literature, these qualities can be in opposition to each other: being honest can cause hurt someone’s feelings, or helping answer a question about how to build a bomb could cause harm,” he said. “Thus, deception can sometimes be a desirable property of an AI system. The authors call for more research on how to control veracity, which, although challenging, would be a step towards limiting its potentially harmful effects.”

A Meta spokesperson said: “Our work on Cicero was purely a research project and the models our researchers built are uniquely trained to play the Diplomacy game…Meta regularly shares our research results to validate them and allow others to build responsibly based on our research. progress. “We have no plans to use this research or its learnings in our products.”

You may also like