Recent AI models are strikingly human-like in their ability to generate text, audio and video when requested. However, until now these algorithms have largely remained relegated to the digital world, rather than the three-dimensional physical world in which we live. In fact, whenever we try to apply these models to the real world, even the most sophisticated ones struggle to function properly. —Just think, for example, how difficult it has been to develop safe and reliable autonomous vehicles. While artificially intelligent, these models not only simply do not understand physics, but they also often hallucinate, leading them to make inexplicable mistakes.
However, this is the year that AI will finally make the leap from the digital world to the real world we inhabit. Expanding AI beyond its digital limits requires reworking the way machines think, fusing the digital intelligence of AI with the mechanical dexterity of robotics. This is what I call “physical intelligence,” a new form of intelligent machine that can understand dynamic environments, cope with unpredictability, and make decisions in real time. Unlike the models used by standard AI, physical intelligence has its roots in physics; in understanding fundamental principles of the real world, such as cause and effect.
These features allow physical intelligence models to interact and adapt to different environments. In my research group at MIT we are developing physical intelligence models that we call liquid networks. In one experiment, for example, we trained two drones (one operated by a standard AI model and another by a liquid network) to locate objects in a forest during the summer, using data captured by human pilots. While both drones performed equally well when tasked with doing exactly what they had been trained to do, when asked to locate objects under different circumstances (during winter or in an urban environment), only the drone liquid network successfully completed its task. This experiment showed us that, unlike traditional AI systems that stop evolving after their initial training phase, liquid networks continue to learn and adapt from experience, just as humans do.
Physical intelligence is also capable of physically interpreting and executing complex commands derived from text or images, bridging the gap between digital instructions and real-world execution. For example, in my lab, we have developed a physically intelligent system that, in less than a minute, can iteratively design and then 3D print small robots based on cues like “robot that can walk forward” or “robot that can grasp.” . objects.”
Other laboratories are also making important progress. For example, robotics startup Covariant, founded by UC-Berkeley researcher Pieter Abbeel, is developing chatbots, similar to ChatGTP, that can control robotic arms when requested. They have already raised more than $222 million to develop and deploy sorting robots in warehouses around the world. Recently, a team from Carnegie Mellon University also proven that a robot with a single camera and imprecise drive can perform dynamic, complex parkour movements, including jumping over obstacles twice its height and through spaces twice its length, using a single neural network trained through reinforcement learning.
If 2023 was the year of text-to-image and 2024 was text-to-video, then 2025 will mark the era of physical intelligence, with a new generation of devices, not just robots, but everything from power grids. even smart homes. —that they can interpret what we tell them and execute tasks in the real world.