DeepMind today announced a new milestone for its artificial intelligence agents who have been trained to play the Blizzard Entertainment game StarCraft II. The more advanced software from the Google-managed AI lab, still called AlphaStar, is now grandmaster of the real-time strategy game and can beat 99.8 percent of all human players. The findings are published in a research article in the scientific journal Nature.
Not only that, but DeepMind says it has also leveled the playing field when testing the new and improved AlphaStar against human opponents who signed up for online competitions last summer. First, it has trained AlphaStar to use all three playable races of the game, which contributed to the complexity of the game in the higher regions of pro play. It also limited AlphaStar to viewing only the part of the map that a human would see and limited the number of mouse clicks it could record to 20 actions every five minutes to bring it in line with standard human movements.
Yet the AI was able to reach the level of the grandmaster, the highest possible online competitive rankings, and marks the first system that ever did this in StarCraft II. DeepMind views progress as more proof that general reinforcement learning, the machine learning technique that supports AlphaStar training, can one day be used to train self-learning robots, self-driving cars and create a more advanced image and object. recognition systems.
“The history of artificial intelligence progress has been marked by milestone performance in games. Since computers have cracked Go, chess and poker, StarCraft has emerged by consensus as the next big challenge, ”said David Silver, a DeepMind principle research scientist in the AlphaStar team, in a statement. “The complexity of the game is much greater than chess, because players control hundreds of units; more complex than Go, because there are 10 ^ 26 possible choices for each move; and players have less information about their opponents than in poker. "
In January, DeepMind announced that its AlphaStar system was able to achieve 10 best consecutive top players 10 games in a row during a pre-recorded session, but it lost to pro player Grzegorz "MaNa" Komincz in a final live online streamed match. The company continued to improve the system between January and June when it said it would accept invitations to play the best human players from around the world. The subsequent competitions took place in July and August, says DeepMind.
The results were amazing: AlphaStar had become one of the most advanced Starcraft II players on the planet, but remarkably still not completely superhuman. There are about 0.2 percent of players able to beat it, but it is largely considered a matter of time before the system improves sufficiently to crush a human opponent.
This research milestone closely matches a similar milestone from the San Francisco-based AI research firm OpenAI, which has trained AI agents using reinforcement learning to play the advanced multi-player five-to-five game Dota 2. In April, the most advanced version of the OpenAI Five software surpassed the world champion Dota 2 team after losing only last summer to two less capable e-sports teams. The jump in the capabilities of OpenAI Five is similar to that of AlphaStar, and both are strong examples of how this approach to AI can produce unprecedented levels of gameplay.
Similar to OpenAI & # 39; s Dota 2 bots and other game agents, the purpose of this type of AI investigation is not just to crush people in different games, just to prove that it is possible. Instead, it must prove that – with sufficient time, effort and resources – advanced AI software can be the best people in virtually any cognitive cognitive challenge, whether it is a board game or a modern video game. It should also show the benefits of reinforcing learning, a special brand of machine learning that has seen tremendous success in recent years combined with huge amounts of computing power and training methods such as virtual simulation.
Like OpenAI, DeepMind trains its AI agents against versions of themselves and at an accelerated pace, so that the agents can clock hundreds of years of playing time within a few months. That has ensured that this type of software is on par with some of the most talented human players of Go and now much more advanced games such as Starcraft and Dota.
Yet the software is still limited to the scary discipline that it is designed to address. The Go-playing agent cannot play Dota, and vice versa. (DeepMind let a more general version of his Go-playing agent try to play his hand, which he controlled in an eight-hour matter.) That's because the software is not programmed with easily replaceable rule sets or directions. Instead, DeepMind and other research institutions use reinforcement learning to let the agents figure out how to play alone. That is why the software often develops new and wildly unpredictable playing styles that have since been adopted by human top players.
“AlphaStar is an intriguing and unorthodox player – one with the reflexes and speed of the best professionals, but with strategies and a unique style. The way AlphaStar was trained, with agents competing against each other in a competition, has resulted in unimaginably unusual gameplay; it really makes you wonder how many of the various features of StarCraft pro players have really explored, "said Diego" Kelazhur "Schwimer, a pro player from Panda Global, in a statement." Although some of AlphaStar's strategies at first face may seem strange, I can't help but wonder if combining all the different playing styles that it demonstrated is the best way to play the game. "
DeepMind hopes that advancements in the field of reinforcement learning by the laboratory and fellow AI researchers will be more widely applicable at a later date. The most likely real-world application for such software is robotics, where the same techniques can train AI agents well in performing real-world tasks, such as the operation of robot hands, in virtual simulation. Then, after simulating years of motor control, the AI can take control of physical robotic arms and maybe even control full-body robots someday. But DeepMind also sees increasingly sophisticated – and therefore safer – self-driving cars as another location for its specific approach to machine learning.