OpenAI Five twice beats the professional Dota 2 team



OpenAI, an Artificial Intelligence research organization based in San Francisco and backed by technology luminaries Reid Hoffman and Peter Thiel, has investigated autonomous systems that can achieve superhuman performance in Pong and Montezuma's Revenge, as well as capable natural language systems. an impressive consistency. But he also spent almost four years developing an artificial intelligence capable of playing the man in Valve's Dota 2 fighting game, and today is the result of his free work for a team of professional players .

At a busy event in San Francisco, OpenAI Five (OpenAI standalone system) faced the European OG – a sports collective that became the first winner of four Dota Major championships in 2017 – in a series of rounds commented by players William "Blitz" Lee. Austin "capitalist" Walsh, Owen "ODPixel" Davies, Kevin "Purge" Godec and Jorien "Sheever" van der Heijden. The stakes were a bit higher than the previous OpenAI matches; In a best-of-three match at the Valve 2018 International Competition (priced at $ 25 million), two professional teams defeated OpenAI Five.

This time, the robots won the first two games out of three in draft mode, which allows each team to prohibit the characters to prevent the other to select them. In the second match, OpenAI Five came out victorious after about 20 minutes, about half of the duration of the first match.

The rules were the same as last summer at The International: the robots did not have invulnerable letters (NPCs that delivered objects to heroes), which during their rounds were used to carry a flood of potions care of their player characters. OpenAI has also played on the latest Dota 2 fix, with the invocation and illusion functions disabled. Nevertheless, it benefited from both a "smoother" training process and significantly. more training; According to co-founder and president of OpenAI, Greg Brockman, he now owns 45,000 years of gaming experience in Dota 2.

Historically, the Achilles heel of OpenAI Five is the lack of long-term planning – it often focuses on short-term benefits rather than long-term benefits. The Dota 2 games usually last between 30 and 45 minutes and OpenAI indicates that its artificial intelligence agents have a "reward half-life," the payback time of the next 14 winnings. Another disadvantage of the bot? He does not learn between games,

OpenAI has preferred to defend its rounds in today's games, although it has sometimes led a hero to attack proactively. He made some mistakes, like directing one of his player characters – Death Prophet – to use his ultimate skill against an enemy hero, Riki, after which the latter became invisible and withdrew. But this demonstrated a gift of "juggling", that is, killing creatures to keep them away from the main action (despite the fact that they were moving away from collecting of resources, attacking tricks and achieving goals). In addition, he ordered the heroes to move away in situations where it was likely that damage over time would kill them, twinkle and remain invisible to avoid being killed and spend money in the game to restore the indicators. health of heroes.

"OG played extremely oddly all the time, and we saw that sometimes it worked, and it did not really work," Mike Cook wrote on Twitter. "I do not really know how to interpret the new robots … They are clearly very different … But I also think that the draft of OG and the game are very different from what we have seen before human teams."

At the end of today's game, OpenAI announced the release of a platform for the public to play against OpenAI Five, a mode called Arena, starting April 18-21.

How OpenAI approached Dota 2

Valve's Dota 2 – a follow-up to Defense of the Ancients (DotA), a community-created mod for Blizzard's Warcraft III: Reign of Chaos – is what's known as a multiplayer online fighting arena, or MOBA. Two groups of five players, each with a base to occupy and defend, attempt to destroy a structure – the former – located on the base of the opposing team. Player characters (heroes) have a distinct set of abilities and collect experience points and items that unlock new attacks and defensive moves.

It's more complex than it looks. The average match contains 80,000 individual frames, in which each character can perform tens of 170,000 possible actions. The heroes on the board finish on average 10,000 shots per frame, contributing to over 20,000 dimensions of the game. And each of these heroes – there are more than 100 – can pick up or buy hundreds of items in the Thu.

OpenAI Five is not yet able to handle the full game – it can only play 18 of the 115 different heroes, and it can not use abilities such as invocations and illusions. And in a somewhat controversial design decision, OpenAI engineers chose do not to read it in pixels of the game to retrieve information (like human players). Instead, I use Dota 2's bot API, which prevents him from searching the map for his team's location, checking to see if a spell is ready, or estimating health or distance. an enemy.

That said, he is able to form an entirely autonomous team that takes into account the choices of the opposing team.

OpenAI eliminated the dilemma of Dota 2 for a while and introduced its first robot MOBA, which beat one of the best world players, Danil "Dendi" Ishutin, in a one-on-one matchup. – in August 2017. In June, OpenAI Five, an improved system capable of playing five-on-five matches, managed to beat an OpenAI team of employees, a team of audience members, a Valve employee team, an amateur team and a semi-professional team.

OpenAI Dota 2

Above: View of OpenAI Five from the Dota 2 battlefield.

Image credit: OpenAI

In early August, he won two games out of three against a team ranked at 99.95th percentile. In the first match, Open AI Five started and ended in force, thus preventing his human opponents from destroying one of his defensive rounds. The second game was a little less one-sided – the humans pulled out one of the OpenAI Five towers – but the respondent nevertheless came out victorious. Only in the third match did the human players win a victory.

OpenAI Five is composed of five 1024 long-lived single-layer networks (LSTMs) – a type of recurrent neural network that can "memorize" values ​​over an arbitrary duration – each attributed to a single hero. Networks are trained with the help of a deep reinforcement learning model that encourages them to self-improve with rewards. In the case of OpenAI Five, these rewards are wins, deaths, aids, last-mile hits, net worth and other stats to track the progress of Dota 2.

The OpenAI – Rapid training framework consists of two parts: a set of deployment workers running a copy of Dota 2 and an LSTM network, and optimization nodes performing a synchronous gradient descent (essential step of machine learning) on ​​a park of graphics cards. As experience expands, the deployment agents inform the optimizer nodes, and another set of operators compares the LSTM networks (agents) trained to the reference agents.

To improve, OpenAI Five plays every day for 180 years – 80% against itself and 20% against the past – with 256 Nvidia Tesla P100 graphics cards and 128,000 processor cores on Google's cloud platform. A few months ago, when OpenAI launched the training, AI-controlled Dota 2 heroes "walked aimlessly on the map". But soon after, the AI ​​mastered the basics as the defense of the tracks in agriculture and quickly defined advanced strategies like the rotation of the heroes. around the map and steal objects from opponents.

"People thought that this sort of thing was impossible using the deep learning of today," Brockman told VentureBeat at an interview last year. "But it turns out that these networks [are] able to play at the professional level in terms of some of the strategies they discover … and to make a long-term planning. What shocks me is that we are using already existing algorithms, which we already have, that people have said they have very specific flaws. "

Fully trained OpenAI Five agents are surprisingly sophisticated. Although they are unable to communicate with each other (a hyperparameter value "team spirit" determines how much each agent gives priority to individual rewards versus the reward of the team), they master projectile avoidance and the sharing of experience points, and even advanced tactics. as a "creep blockage," in which a hero physically blocks the path of a hostile creep (a basic unit in Dota 2) to slow his progress.

Dota 2 players are already studying OpenAI Five's game styles, some of which are surprisingly creative. (In one game, the robots adopted a mechanism that allowed their heroes to quickly reload a weapon while staying away from enemies.) As for OpenAI, it applies some of the information gathered to other areas. : Last February, he released Hindsight Experience Replay (HER), an open source algorithm that effectively helps robots learn from their failures, and later this year published research on a self-learning robotic system. able to manipulate objects with human dexterity.

Brockman said that although today's match is the last public event, OpenAI "will continue to work" on OpenAI Five.

"The beauty of this technology is that it does not even know that it's [playing] Dota … It's about allowing people to connect the strange, exotic but still very tangible intelligences that are created … modern AI technology, "he said. "The games really have been the reference [in AI research] … These complex strategy games are the milestone on which … we all work because they start capturing aspects of the real world. "


Source link