Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Hebrew University of Jerusalem

A virtual game of Mafia, played by human players and an LLM agent player. The agent integrates in the asynchronous group conversation by constantly simulating the decision to send a message.

Abstract

LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are asynchronous. For example, in group chats, online team meetings, or social games, there is no inherent notion of turns. In this work, we develop an adaptive asynchronous LLM agent consisting of two modules: a generator that decides what to say, and a scheduler that decides when to say it. To evaluate our agent, we collect a unique dataset of online Mafia games, where our agent plays with human participants. Overall, our agent performs on par with human players, both in game performance metrics and in its ability to blend in with the other human players. Our analysis shows that the agent's behavior in deciding when to speak closely mirrors human patterns, although differences emerge in message content. We make all of our code and data publicly available. This work paves the way for integration of LLMs into realistic human group settings, from assistance in team discussions to educational and professional environments where complex social dynamics must be navigated.

Asynchronous Agent

Most of today's work on LLMs focus on synchronous setting, where communication is done in turns. In contrast, much of real-world communication is done asynchronously, where participants need not only to decide what to say, but also when to speak.

Despite the prevalence in real-world interaction, to the best of our knowledge, there is no prior work that targets group asynchronous communication in the context of LLMs. Instead, we find that the models developed for social interaction are often modeled as involving predefined turns.

In this work, we develop an LLM-based agent for such asyncronous multi-party environments, applicable in a wide range of real-world settings, including group chats, online team meetings, or social games.

Our agent consists of two LLM-based modules: the scheduler, deciding whether to post a message to the chat at a given moment, and the generator, which composes the message content. They use the environment's setting and metadata as part of the context, and adapt the prompts based on the social state of the conversation. Our agent orchestrates a two-stage call to an LLM.

The game of Mafia

Due to the challenging ambiguity of evaluation of asynchrony modeling, we choose to set it in a game setting.

Games give each participant an objective. Winning the game is a proxy metric of whether the communication was successful. It sets the conversation under a frame of rules, where each participant needs to use the communication to advance to their target.

We choose the social deduction game Mafia, where participants are assigned secret roles (either mafia or bystander), and need to collaborate under uncertainty and deception to vote out player from the opposite team.

We choose the game of Mafia since, it can be based solely on textual interaction, which allows LLMs to play together with human players; it requires cooperating amid ambiguity, making the communication a fundamental aspect of the game; and it centers around suspicion of other players, so timing of communication can crucial for the player to avoid suspicion.

Mafia rules

The LLMafia Dataset

To evaluate our proposed strategy of asynchrony for LLMs, we run games of Mafia with human players, incorporating an agent as an additional player, within an asynchronous chat environment.

The LLMafia dataset comprises 33 games and 3593 messages, 275 of which were sent by the LLM agent. For a detailed overview of the dataset and files and for accessible download, visit the dataset page on HuggingFace.

While prior datasets for Mafia exist, ours is the first to integrate an LLM agent, allowing analysis of human-LLM interactions.


Daytime example conversation

Illustrated example of a real conversation in a game from LLMafia, with highlight comments.

Analysis

Analysis of the dataset is described thoroughly in our paper, with a focus on our LLM agent performance in the game from different perspectives. Our agent aligns with human players in message timing, message quantity, and winning rates, and human players fail to identify the agent in more than 85% of the cases (when playing with our larger model). However, we also find notable differences in message content, where the agent's messages are longer and can be distinguishable from human players by learned classifiers.



Time difference histograms

Distribution of time differences between messages. Each observation in this distribution represents a player in a specific game and the observation's value is the mean time difference (in seconds) between the player's message and a previous message by another player (left) or by the same player (right), averaged across all messages of this player in this game. Blue, yellow and red distributions represent human, Llama3.1 8B-based agent and Llama3.3 70B-based agent players, respectively.


Winning percentages
Win percentages of human players compared to the LLM agents, by role in the game.
Message embedding separation classification Separation performance by a linear classifier for message embeddings. The two Agent columns represent the F1 scores for analysis over games played with the corresponding LLMs.

BibTeX

@misc{eckhaus2025timetalkllmagents,
      title={Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games},
      author={Niv Eckhaus and Uri Berger and Gabriel Stanovsky},
      year={2025},
      eprint={2506.05309},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2506.05309},
}