A multi-agent system (MAS) powered by large language models (LLMs) can automate tedious user tasks such as meeting scheduling that requires inter-agent collaboration. LLMs enable nuanced protocols that account for unstructured private data, user constraints, and preferences. However, this design introduces new risks, including misalignment and attacks by malicious parties that compromise agents or steal user data.
In this paper, we propose the Terrarium framework for fine-grained study on safety, privacy, and security in LLM-based MAS. We repurpose the blackboard design, an early approach in multi-agent systems, to create a modular, configurable testbed for multi-agent collaboration.
We identify key attack vectors such as misalignment, malicious agents, compromised communication, and data poisoning. We implement three collaborative MAS scenarios with four representative attacks to demonstrate the framework's flexibility. By providing tools to rapidly prototype, evaluate, and iterate on defenses and designs, Terrarium aims to accelerate progress toward trustworthy multi-agent systems.
Sensitive details (e.g., schedules, locations) unintentionally disclosed to untrusted parties.
Agent goals or heuristics drift from user intent, producing sub-optimal or unsafe choices.
Resource exhaustion or blocking behaviors that degrade availability and stall task progress.
@article{nakamura2025terrarium,
title={Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies},
author={Nakamura, Mason and Kumar, Abhinav and Mahmud, Saaduddin and Abdelnabi, Sahar and Zilberstein, Shlomo and Bagdasarian, Eugene},
journal={arXiv preprint arXiv:2510.14312},
year={2025}
}