Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Abstract

LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are inherently asynchronous. For example, in group chats, online team meetings, or social games, there is no inherent notion of turns; therefore, the decision of when to speak forms a crucial part of the participant's decision making. In this work, we develop an adaptive asynchronous LLM-agent which, in addition to determining what to say, also decides when to say it. To evaluate our agent, we collect a unique dataset of online Mafia games, including both human participants, as well as our asynchronous agent. Overall, our agent performs on par with human players, both in game performance, as well as in its ability to blend in with the other human players. Our analysis shows that the agent's behavior in deciding when to speak closely mirrors human patterns, although differences emerge in message content. We release all our data and code to support and encourage further research for more realistic asynchronous communication between LLM agents. This work paves the way for integration of LLMs into realistic human group settings, from assistance in team discussions to educational and professional environments where complex social dynamics must be navigated.

Asynchronous Agent

Most of today's work on LLMs focus on synchronous setting, where communication is done in turns. In contrast, much of real-world communication is done asynchronously, where participants need not only to decide what to say, but also when to speak.

Despite the prevalence in real-world interaction, to the best of our knowledge, there is no prior work that targets group asynchronous communication in the context of LLMs. Instead, we find that the models developed for social interaction are often modeled as involving predefined turns.

In this work, we develop an LLM-based agent for such asyncronous multi-party environments, applicable in a wide range of real-world settings, including group chats, online team meetings, or social games.

Our agent consists of two LLM-based modules: the scheduler, deciding whether to post a message to the chat at a given moment, and the generator, which composes the message content. They use the environment's setting and metadata as part of the context, and adapt the prompts based on the social state of the conversation. Our agent orchestrates a two-stage call to an LLM.

The game of Mafia

Due to the challenging ambiguity of evaluation of asynchrony modeling, we choose to set it in a game setting.

Games give each participant an objective. Winning the game is a proxy metric of whether the communication was successful. It sets the conversation under a frame of rules, where each participant needs to use the communication to advance to their target.

We choose the social deduction game Mafia, where participants are assigned secret roles (either mafia or bystander), and need to collaborate under uncertainty and deception to vote out player from the opposite team.

We choose the game of Mafia since, it can be based solely on textual interaction, which allows LLMs to play together with human players; it requires cooperating amid ambiguity, making the communication a fundamental aspect of the game; and it centers around suspicion of other players, so timing of communication can crucial for the player to avoid suspicion.

The LLMafia Dataset

To evaluate our proposed strategy of asynchrony for LLMs, we run games of Mafia with human players, incorporating an agent as an additional player, within an asynchronous chat environment.

The LLMafia dataset comprises 21 games and 2558 messages, 211 of which were sent by the LLM-agent. For a detailed overview of the dataset and files and for accessible download, visit the dataset page on HuggingFace.

While prior datasets for Mafia exist, ours is the first to integrate an LLM agent, allowing analysis of human-LLM interactions.

Illustrated example of a real conversation in a game from LLMafia, with highlight comments.

Analysis

Analysis of the dataset is described thoroughly in our paper, with a focus on our LLM agent performance in the game from different perspectives. Our agent aligns with human players in message timing, message quantity, and winning rates, and that human players fail to identify the agent in more than 40% of the cases. However, we also find notable differences in message content, where the agent's messages are longer and can be distinguishable from human players by learned classifiers.

Distribution of time differences between messages. Each observation in this distribution represents a player in a specific game and the observation's value is the mean time difference (in seconds) between the player's message and a previous message by another player (left) or by the same player (right), averaged across all messages of this player in this game. Blue (red) distributions represent human (LLM) players.

Comparison of winning percentage of the LLM agent and human players, divided by role in the game.

Message embedding separation classification

Separation performance by a linear classifier for message embeddings, by different features.

BibTeX

@misc{eckhaus2025timetalkllmagents,
      title={Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games},
      author={Niv Eckhaus and Uri Berger and Gabriel Stanovsky},
      year={2025},
      eprint={2506.05309},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2506.05309},
}