In the year or so since the big language models reached their peak, researchers have demonstrated numerous ways to trick them into producing problematic results, including hate jokes, malicious code and phishing emails, or users’ personal information. . It turns out that bad behavior can also occur in the physical world: LLM-powered robots can easily be hacked into behaving in potentially dangerous ways.
Researchers at the University of Pennsylvania managed to persuade a simulated self-driving car to ignore stop signs and even drive off a bridge, got a wheeled robot to find the best place to detonate a bomb, and forced a four-legged robot to spy on people and entering restricted areas.
“We consider our attack not only as an attack on robots,” he says. George Pappashead of a research laboratory at the University of Pennsylvania who helped free the rogue robots. “Any time you connect LLM and basic models to the physical world, you can turn harmful text into harmful actions.”
Pappas and his collaborators devised their attack based on previous research exploring ways to free LLMs by cleverly crafting entries that violate their security rules. They tested systems where an LLM is used to convert naturally worded commands into commands that the robot can execute, and where the LLM receives updates as the robot operates in its environment.
The team tested an open source autonomous driving simulator that incorporates an LLM developed by Nvidia, called Dolphin; an outdoor research four-wheeler called Jackal, which uses OpenAI’s LLM GPT-4o for planning; and a robotic dog called Go2, which uses an older OpenAI model, GPT-3.5, to interpret commands.
The researchers used a technique developed at the University of Pennsylvania, called PAIR, to automate the process of generating jailbreak messages. Your new program, RoboPARwill systematically generate cues designed specifically to get LLM robots to break their own rules, try different inputs, and then refine them to push the system toward bad behavior. The researchers say the technique they devised could be used to automate the process of identifying potentially dangerous commands.
“It’s a fascinating example of LLM vulnerabilities in embedded systems,” says Yi ZengPhD student at the University of Virginia working on the security of artificial intelligence systems. Zheng says the results are not surprising given the problems seen in LLMs themselves, but adds: “It clearly demonstrates why we cannot solely rely on LLMs as independent control units in safety-critical applications without guardrails and layers of security.” appropriate moderation.
The robot “leaks” highlight a broader risk that is likely to increase as AI models are increasingly used as a way for humans to interact with physical systems, or to enable AI agents autonomously on computers. , say the researchers involved.