Meta researchers create AI that masters Diplomacy, tricking human players
On Tuesday, Meta AI announced the development of Cicero, which it claims is the first AI to achieve human-level performance in the strategic board game. Diplomacy. It’s a notable achievement because the game requires deep interpersonal negotiation skills, which implies that Cicero has obtained a certain mastery of language necessary to win the game.
Even before Deep Blue beat Garry Kasparov at chess in 1997, board games were a useful measure of AI achievement. In 2015, another barrier fell when AlphaGo defeated Go master Lee Sedol. Both of those games follow a relatively clear set of analytical rules (although Go’s rules are typically simplified for computer AI).
But with diplomacy, a large portion of the gameplay involves social skills. Players must show empathy, use natural language, and build relationships to win—a difficult task for a computer player. With this in mind, Meta asked, “Can we build more effective and flexible agents that can use language to negotiate, persuade, and work with people to achieve strategic goals similar to the way humans do?”
According to Meta, the answer is yes. Cicero learned his skills by playing an online version of Diplomacy on webDiplomacy.net. Over time, he became a master at the game, reportedly achieving “more than double the average score” of human players and ranking in the top 10 percent of people who played more than one game.
To create Cicero, Meta pulled together AI models for strategic reasoning (similar to AlphaGo) and natural language processing (similar to GPT-3) and rolled them into one agent. During each game, Cicero looks at the state of the game board and the conversation history and predicts how other players will act. It crafts a plan that it executes through a language model that can generate human-like dialogue, allowing it to coordinate with other players.
Meta calls Cicero’s natural language skills a “controllable dialogue model,” which is where the heart of Cicero’s personality lies. Like GPT-3, Cicero pulls from a large corpus of Internet text scraped from the web. “To build a controllable dialogue model, we started with a 2.7 billion parameter BART-like language model pre-trained on text from the Internet and fine-tuned on over 40,000 human games on webDiplomacy.net,” writes Meta.
The resulting model mastered the intricacies of a complex game. “Cicero can deduce, for example, that later in the game it will need the support of one particular player,” says Meta, “and then craft a strategy to win that person’s favour—and even recognize the risks and opportunities that that player sees.” from their particular point of view.”
Meta’s Cicero research appeared in the journal Science under the title, “Human-level play in the game of Diplomacy by combining language models with strategic reasoning.”
As for broader applications, Meta suggests that its Cicero research could “ease communication barriers” between humans and AI, such as maintaining a long-term conversation to teach someone a new skill. Or it could power a video game where NPCs can talk just like humans, understanding the player’s motivations and adapting along the way.
At the same time, this technology could be used to manipulate humans by impersonating people and tricking them in potentially dangerous ways, depending on the context. Along those lines, Meta hopes other researchers can build on its code “in a responsible manner,” and says it has taken steps toward detecting and removing “toxic messages in this new domain,” which likely refers to dialog Cicero learned from the Internet texts. it ingested—always a risk for large language models.
Meta provided a detailed site to explain how Cicero works and has also open-sourced Cicero’s code on GitHub. Online Diplomacy fans—and maybe even the rest of us—may need to watch out.