Despite impressive AI advances in recent years, robots remain stubbornly dumb and limited. Those in factories and warehouses often go through precisely choreographed routines without much ability to perceive their surroundings or adapt on the fly. The few industrial robots that can see and grasp objects can only do a limited number of things with minimal dexterity due to a lack of general physical intelligence.
Robots with more general capabilities could take on a much broader range of industrial tasks, perhaps after minimal demonstrations. Robots will also need more general capabilities to cope with the enormous variability and disorder of human homes.
General enthusiasm for AI progress has already translated into optimism about major new advances in robotics. Elon Musk’s car company Tesla is developing a humanoid robot called Optimus, and Musk recently suggested which would be widely available for between $20,000 and $25,000 and capable of performing most tasks by 2040.
Previous efforts to teach robots to perform challenging tasks have focused on training a single machine on a single task because the learning seemed non-transferable. Some recent academic work has shown that with sufficient scale and adjustment, learning can be transferred between different tasks and robots. A Google project for 2023 called Open X-Encarnación It involved sharing robot learning between 22 different robots in 21 different research laboratories.
A key challenge with the strategy Physical Intelligence is pursuing is that there is not the same scale of robot data available for training as there is for large language models in text form. Therefore, the company has to generate its own data and devise techniques to improve learning from a more limited data set. To develop π0, the company combined so-called vision language models, which are trained with both images and text, with diffusion modeling, a technique borrowed from AI image generation, to enable a more general type of learning.
For robots to be able to perform any task that a person asks of them, such learning will need to be significantly expanded. “There is still a long way to go, but we have something that can be considered a scaffold that illustrates what is to come,” says Levine.