How Google’s experimental 3D telepresence mode works

In a new research paper, Google detailed the technology behind its impressive Project Starline demo from this year’s I/O conference. Project Starline is essentially a 3D video chat booth that aims to replace a one-on-one 2D video conference call with an experience that feels like you’re actually sitting in front of a real human being.

It sounds simple, but Google’s research report shows just how many challenges there are to tricking your brain into thinking there’s a real human sitting just a few feet away from you. Of course, the image must be high resolution and free of distracting artifacts, but it must also look correct from your relative position in the booth. Audio is another challenge, as the system has to make it sound like a person’s words are coming out of their actual mouth. And then there’s the small matter of eye contact.

Ultimately, though, the hope is that Project Starline can provide a similar sense of presence as virtual or augmented reality, without requiring users to wear bulky headsets or trackers.

The display unit and the various tracking hardware.
Image: Google

The paper indicates exactly how much hardware is needed to solve these problems. The system is built around a large 65-inch 8K panel that runs at 60 Hz. Around it, Google engineers have set up three capture pods that can capture both color images and depth data. The system also includes four additional tracking cameras, four microphones, two speakers and infrared projectors. In total, color images are captured from four vantage points, as well as three depth maps, for a total of seven video streams. Audio is captured at 44.1 kHz and encoded at 256 Kbps.

Obviously, all this hardware generates a lot of data to be sent, and Google says the transmission bandwidth ranges from 30Mbps to 100Mbps depending on “the texture detail in the user’s clothing and the magnitude of their gestures”. So it’s considerably more than a standard Zoom call, but nothing that a typical office in a metropolitan area can’t handle. Project Starline is equipped with four high-end Nvidia graphics cards (two Quadro RTX 6000 cards and two Titan RTXs) to encode and decode all this data. End-to-end latency is said to average 105.8 milliseconds.

The system consists of a backlight unit and a display unit.
Image: Google

As Google tells it, employees who have used Starline on the three sites where it’s installed think it’s better than traditional video conferencing when it comes to creating a sense of presence, personal connection, as well as helping with alertness and response measurement. . The company says that over nine months, 117 participants held a total of 308 meetings in its telepresence booths, with an average meeting time of just over 35 minutes.

It all sounds promising, but there’s no indication yet when, or even if, the system will ever hit the market. There’s also very little information on how much Starline’s vast array of hardware will actually cost (although Table 4 in the research paper outlines the tracking and display hardware it uses, if you feel like doing some math). For now, Google says it is expanding Project Starline’s availability “in more Google offices in the United States.”