The GAZEploit attack consists of two parts, says Zhan, one of the lead researchers. First, the researchers created a way to identify when someone using Vision Pro is typing by analyzing the 3D avatar they are sharing. To do this, they trained a recurrent neural network, a type of deep learning model, on recordings of 30 people’s avatars as they completed a variety of typing tasks.
When someone is typing with Vision Pro, their gaze bindings about the key they are likely to press, researchers say, before… exciting to the next key. “When we type, our gaze shows some regular patterns,” Zhan says.
Wang says these patterns are more common while typing than when browsing a website or watching a video with headphones on. “During tasks like eye-typing, the blinking frequency decreases because you’re more focused,” Wang says. In short: Looking at a QWERTY keyboard and moving between letters is quite a different behavior.
The second part of the research, Zhan explains, uses geometric calculations to determine where someone has placed the keyboard and what size they have chosen. “The only requirement is that as long as we get enough gaze information to be able to accurately retrieve the keyboard, then all subsequent keystrokes can be detected.”
By combining these two elements, they were able to predict the keys someone was likely to type. In a series of lab tests, they had no knowledge of the victim’s typing habits, speed, or where the keyboard was placed. Yet the researchers were able to predict the correct typed letters—on up to five attempts—with 92.1 percent accuracy for text messages, 77 percent of the time for passwords, 73 percent of the time for PINs, and 86.1 percent of the time for emails, URLs, and web pages. (On the first guess, the letters would be correct between 35 and 59 percent of the time, depending on the type of information they were trying to find out.) Duplicate letters and typos add additional challenges.
“It’s very important to know where someone is looking,” says Alexandra Papoutsaki, an associate professor of computer science at Pomona College, who has He studied eye tracking for years and reviewed GAZEploit’s research for WIRED.
Papoutsaki says the work stands out because it only relies on streaming video of someone’s Persona, making it a more “realistic” space for an attack to occur compared to a hacker getting in touch with someone’s headset and trying to access eye-tracking data. “The fact that now someone, just by streaming their Persona, can potentially expose what they’re doing is where the vulnerability becomes much more critical,” Papoutsaki says.
While the attack was created in a lab and hasn’t been used against anyone using Personas in the real world, researchers say there are ways hackers could have abused the data leak. They say that, at least in theory, a criminal could share a file with a victim during a Zoom call, causing them to log into, say, a Google or Microsoft account. The attacker could then record Persona while their target logs in and use the attack method to recover their password and access their account.
Quick fixes
GAZEpolit researchers reported their findings to Apple in April and subsequently sent the company their proof-of-concept code so the attack could be replicated. Apple patched the flaw in a Vision Pro software update in late July, which prevents sharing a Persona if someone is using the virtual keyboard.
An Apple spokesperson confirmed that the company has fixed the vulnerability and said it was fixed in VisionOS 1.3. Company software update notes Don’t mention the solutionResearchers say Apple has assigned CVE-2024-40865 to the vulnerability and recommend people download the latest software updates.