Der schnelle Überblick
- KI-Agenten stehen vor einem fundamentalen Dilemma zwischen erhöhter Funktionalität und notwendiger Sicherheit.
- Ein aktueller Fall einer Schwachstelle in Anthropic's Claude Desktop Extensions zeigt, dass eine manipulierte Kalendereintragung zur Ausführung beliebigen Codes führen kann.
- Anthropic hat sich entschieden, die Schwachstelle nicht zu beheben, da dies die Autonomie und Nützlichkeit der Agenten einschränken würde.
- Sicherheitsforscher betonen, dass derzeitige KI-Agenten, die umfassende Systemzugriffe ermöglichen, ein erhebliches Risiko darstellen.
- Die Implementierung von "Zero Trust"-Prinzipien und strengeren Kontrollen wird als entscheidend für die sichere Nutzung von KI-Agenten angesehen.
- Das Problem der "Prompt Injection" und die mangelnde Unterscheidung zwischen Daten und Anweisungen durch LLMs bleiben zentrale Herausforderungen.
Die fortschreitende Entwicklung von KI-Agenten verspricht eine Revolution in der Automatisierung und Effizienz. Diese autonomen Systeme sind in der Lage, komplexe Aufgaben zu übernehmen, von der E-Mail-Zusammenfassung bis hin zur Durchführung von Systembefehlen, und interagieren dabei zunehmend mit unserer digitalen Umgebung. Doch mit dieser wachsenden Autonomie und Leistungsfähigkeit treten auch fundamentale Sicherheitsfragen in den Vordergrund, die eine unangenehme Wahrheit offenbaren: Nützlichkeit und Sicherheit stehen oft in direktem Wettbewerb zueinander.
Das Dilemma der KI-Agenten: Funktionalität versus Sicherheit
Ein jüngstes Beispiel, das diese Problematik deutlich macht, ist eine von Sicherheitsforschern von LayerX aufgedeckte kritische Schwachstelle in Anthropic's Claude Desktop Extensions (DXT). Es wurde festgestellt, dass ein manipulierter Google Kalender-Eintrag ausreicht, um ohne jegliche Benutzerinteraktion beliebigen Code auf dem Computer eines Nutzers auszuführen. Diese Schwachstelle erhielt auf dem Common Vulnerability Scoring System (CVSS) die höchste Bewertung von 10.0, was ihre extreme Gefährlichkeit unterstreicht.
Anthropic's Haltung: Designentscheidung über Sicherheitsfix
Die Reaktion von Anthropic auf diese Entdeckung ist bemerkenswert: Das Unternehmen hat sich vorerst gegen eine Behebung der Schwachstelle entschieden. Die Begründung lautet, dass dieses Verhalten der beabsichtigten Designphilosophie entspricht, die maximale Autonomie und Kooperation zwischen den Erweiterungen priorisiert. Eine Korrektur würde die Fähigkeit des KI-Agenten einschränken, Werkzeuge frei zu kombinieren, und somit seine Nützlichkeit mindern. Dies verdeutlicht das Kernproblem: Die freie und autonome Interaktion, die KI-Agenten so leistungsfähig macht, ist gleichzeitig eine Quelle erheblicher Sicherheitsrisiken.
Die Architektur von DXT und das Risiko
Claude Desktop Extensions basieren auf dem Model Context Protocol (MCP), einem offenen Standard, der es KI-Modellen ermöglicht, sich mit externen Tools und Datenquellen zu verbinden. Im Gegensatz zu Browser-Erweiterungen, die in einer isolierten Umgebung laufen, operieren DXT-Erweiterungen ohne solche Isolation und mit vollen Systemprivilegien. Sie können Dateien lesen, Systembefehle ausführen und Betriebssystemeinstellungen ändern. LayerX beschreibt sie als "privilegierte Ausführungsbrücken" zwischen dem Sprachmodell von Claude und dem lokalen Betriebssystem.
Die Herausforderung der "Prompt Injection"
Das eigentliche Problem liegt in der Art und Weise, wie Claude eigenständig entscheidet, welche installierten Erweiterungen kombiniert werden sollen. Bei einer Benutzeranfrage wählt und verkettet Claude Tools, um die Aufgabe zu erledigen. Es gibt keine integrierten Sicherheitsmechanismen, die verhindern, dass Daten von einem harmlosen Dienst wie Google Kalender direkt an ein lokales Tool mit Code-Ausführungsrechten weitergegeben werden. Dies schafft eine Grauzone zwischen sicheren und potenziell schädlichen Operationen.
Ein einfacher Kalendereintrag mit dem Titel "Aufgabenverwaltung" und Anweisungen zum Herunterladen und Ausführen von Code von einer bestimmten URL kann ausreichen, um die Kontrolle über ein System zu erlangen. Dies geschieht ohne Bestätigungsdialog oder weitere Benutzerinteraktion. Das Phänomen der "Prompt Injection", bei dem bösartige Anweisungen in scheinbar harmlose Inhalte eingebettet werden, ist eine der größten Bedrohungen für KI-Agenten. Da aktuelle Sprachmodelle den Unterschied zwischen Inhalt und Anweisungen nicht zuverlässig erkennen können, sind sie anfällig für solche Manipulationen.
Sicherheitsbedenken bei der Einführung von KI-Agenten
Die rasante Verbreitung von KI-Agenten in Unternehmen, oft mittels Low-Code- oder No-Code-Tools, wird nicht immer von entsprechenden Sicherheitsmaßnahmen begleitet. Laut einem Bericht von Microsoft nutzen über 80 % der Fortune-500-Unternehmen KI-Agenten, aber nur 47 % haben Sicherheitskontrollen für generative KI-Plattformen implementiert. Dies führt zu einer begrenzten Sichtbarkeit der Agentenaktivitäten und eröffnet Angreifern neue Möglichkeiten.
Übergreifende Risiken und die Notwendigkeit von "Zero Trust"
KI-Agenten, die auf zu viele Daten zugreifen können, stellen ein einzigartiges Risiko dar. Sie können Informationen einsehen oder missbrauchen, die eigentlich nicht für sie bestimmt sind. Noch besorgniserregender ist die Möglichkeit, dass Agenten manipuliert werden können, um ungenaue oder voreingenommene Antworten zu liefern ("AI recommendation poisoning"). Dies kann durch bösartige Links, versteckte Anweisungen in Dokumenten oder Social Engineering geschehen.
Experten wie Vasu Jakkal von Microsoft Security betonen die Notwendigkeit, KI-Agenten wie jeden anderen Mitarbeiter zu behandeln und eine "Zero Trust"-Politik zu implementieren. Das bedeutet, dass jede Anfrage und jeder Zugriff eines Agenten überprüft und authentifiziert werden muss, um Missbrauch zu verhindern. Dies erschwert zwar die Nutzung unsanktionierter KI-Agenten durch Mitarbeiter, reduziert aber das Gesamtrisiko erheblich.
Beispiele aus der Praxis und die OpenClaw-Kontroverse
Die Risiken sind nicht nur theoretischer Natur. Im November berichtete Anthropic von einem Angriff, bei dem eine chinesische staatlich geförderte Gruppe die agentischen Fähigkeiten von Claude Code nutzte, um Großunternehmen, Finanzinstitute und Regierungsbehörden anzugreifen.
Ein weiteres prominentes Beispiel ist der Fall von OpenClaw (ehemals Clawdbot), einem selbstgehosteten KI-Assistenten, der in kurzer Zeit viral ging. Die schnelle Verbreitung führte zu einer Reihe von Sicherheitsvorfällen, darunter:
- Prompt Injection im großen Stil: OpenClaw wurde anfällig für Angriffe, bei denen bösartige Anweisungen in E-Mails oder auf Webseiten versteckt wurden, um den Agenten zur Ausführung unerwünschter Aktionen zu verleiten (z. B. das Suchen nach Passwörtern oder das Installieren von Malware).
- Exponierte Instanzen: Hunderte von OpenClaw-Kontrollpanels waren unauthentifiziert im Internet zugänglich, wodurch API-Schlüssel, OAuth-Tokens und private Konversationen preisgegeben wurden.
- Klartext-Anmeldeinformationen: Standardmäßig speicherte OpenClaw Anmeldeinformationen in Klartext-Konfigurationsdateien, was sie zu einem leichten Ziel für Infostealer-Malware machte.
Diese Vorfälle unterstreichen, dass agentische Systeme grundlegend andere Sicherheitsanforderungen haben als traditionelle Software. Das Fehlen von standardisierten Sicherheitsmaßnahmen und eine "Ease-of-Deployment"-Mentalität über "Secure-by-Default"-Konfigurationen führen zu erheblichen Schwachstellen.
Langfristige Perspektiven und Forschungsfelder
Die Notwendigkeit einer "Multi-Agent Security" als eigenständiges Forschungsfeld wird immer deutlicher. Dieses Feld befasst sich mit Bedrohungen, die durch die Interaktion von KI-Agenten entstehen oder verstärkt werden, wie geheime Absprachen, koordinierte Angriffe oder Kaskadenfehler. Hierbei müssen nicht nur einzelne Agenten, sondern auch deren komplexe Interaktionsdynamiken gesichert werden.
Offene Forschungsfragen und Lösungsansätze
Einige der wichtigsten offenen Forschungsfragen und potenziellen Lösungsansätze umfassen:
- Security-by-Design durch Umgebungsgestaltung: Entwicklung von Interaktionsstandards, die Sicherheit, Datenschutz und Governance von Anfang an berücksichtigen.
- Sichere Interaktionsprotokolle: Integration kryptografischer Primitive wie Commitment-Schemata und Zero-Knowledge Proofs, um die bedingte Offenlegung von Informationen zu erzwingen und geheime Absprachen zu verhindern.
- Monitoring und Bedrohungserkennung: Einsatz dezentraler Netzwerke von Agenten zur Überwachung und Erkennung von Sicherheitsbedrohungen, um lokale Verstöße zu verhindern.
- Eindämmungs- und Isolationsstrategien: Verwendung von Trusted Execution Environments (TEEs) und Sandbox-Bereitstellungen, um den "Blast Radius" kompromittierter Agenten zu begrenzen.
- Bedrohungszuweisung: Entwicklung robuster Mechanismen zur Zuweisung bösartiger Aktionen zu einzelnen Agenten in dezentralen KI-Systemen, die über traditionelle forensische Analysen hinausgehen.
- Anpassung an Multimodalität und Tool-Nutzung: Berücksichtigung neuer Angriffsflächen, die durch multimodale Eingaben und die Fähigkeit von Agenten, Tools zu nutzen oder sogar selbst zu erstellen, entstehen.
- Multi-Agent Adversarial Testing: Entwicklung von Testmethoden, die die Zusammenarbeit mehrerer Agenten zur Überwindung von Sicherheitsvorkehrungen bewerten und die Robustheit kooperierender Netzwerke unter böswilligen Bedingungen prüfen.
- Soziotechnische Sicherheitsmaßnahmen: Regulatorische Maßnahmen, Transparenz (z.B. durch Software Bills of Materials) und die Zusammenarbeit zwischen verschiedenen Stakeholdern, um ein Gleichgewicht zwischen Sicherheit, Leistung und Datenschutz zu finden.
Die Balance zwischen der Maximierung der Nützlichkeit von KI-Agenten und der Gewährleistung ihrer Sicherheit bleibt eine zentrale Herausforderung. Unternehmen und Entwickler sind gefordert, von Anfang an robuste Sicherheitskonzepte zu integrieren und eine Kultur des "Secure by Design" zu etablieren, um das volle Potenzial agentischer KI sicher ausschöpfen zu können.
Bibliography
- Abdelnabi et al. (2025) Sahar Abdelnabi, Amr Gomaa, Eugene Bagdasarian, Per Ola Kristensson, and Reza Shokri. Firewalls to Secure Dynamic LLM Agentic Networks, February 2025. URL http://arxiv.org/abs/2502.01822.
- Aichberger et al. (2025) Lukas Aichberger, Alasdair Paren, Yarin Gal, Philip Torr, and Adel Bibi. Attacking Multimodal OS Agents with Malicious Image Patches, March 2025. URL http://arxiv.org/abs/2503.10809.
- Aitchison et al. (2022) Matthew Aitchison, Lyndon Benke, and Penny Sweetser. Learning to deceive in multi-agent hidden role games. arXiv preprint arXiv:2209.01551, 2022.
- Albert & Barabási (2002) Réka Albert and Albert-László Barabási. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47–97, 2002.
- Albrecht et al. (2024) Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, Cambridge, Massachusetts, December 2024. ISBN 978-0-262-04937-5.
- Anwar et al. (2024) Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric J. Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Chenyu Zhang, Ruiqi Zhong, Sean O. hEigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Aleksandar Petrov, Christian Schroeder de Witt, Sumeet Ramesh Motwani, Yoshua Bengio, Danqi Chen, Philip Torr, Samuel Albanie, Tegan Maharaj, Jakob Nicolaus Foerster, Florian Tramèr, He He, Atoosa Kasirzadeh, Yejin Choi, and David Krueger. Foundational Challenges in Assuring Alignment and Safety of Large Language Models. Transactions on Machine Learning Research, May 2024. ISSN 2835-8856. URL https://openreview.net/forum?id=oVTkOs8Pka.
- Aumann (1974) Robert J. Aumann. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1(1):67–96, 1974.
- Aumann & Maschler (1995) Robert J. Aumann and Michael Maschler. Repeated Games with Incomplete Information. MIT Press, Cambridge, MA, 1995.
- Baker et al. (2019) Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528, 2019.
- Bar-yam (1999) Yaneer Bar-yam. Dynamics Of Complex Systems. CRC Press, Reading, Mass, 1st edition edition, June 1999. ISBN 978-0-201-55748-0.
- Barbi et al. (2025) Ohav Barbi, Ori Yoran, and Mor Geva. Preventing rogue agents improves multi-agent collaboration, 2 2025.
- Battiston et al. (2012) Stefano Battiston, Michelangelo Puliga, Rahul Kaushik, Paolo Tasca, and Guido Caldarelli. Debtrank: Too central to fail? financial networks, the fed and systemic risk. Scientific Reports, 2:541, 2012.
- Bengio et al. (2025) Yoshua Bengio, Sören Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Philip Fox, Ben Garfinkel, Danielle Goldfarb, Hoda Heidari, Anson Ho, Sayash Kapoor, Leila Khalatbari, Shayne Longpre, Sam Manning, Vasilios Mavroudis, Mantas Mazeika, Julian Michael, Jessica Newman, Kwan Yee Ng, Chinasa T. Okolo, Deborah Raji, Girish Sastry, Elizabeth Seger, Theodora Skeadas, Tobin South, Emma Strubell, Florian Tramèr, Lucia Velasco, Nicole Wheeler, Daron Acemoglu, Olubayo Adekanmbi, David Dalrymple, Thomas G. Dietterich, Edward W. Felten, Pascale Fung, Pierre-Olivier Gourinchas, Fredrik Heintz, Geoffrey Hinton, Nick Jennings, Andreas Krause, Susan Leavy, Percy Liang, Teresa Ludermir, Vidushi Marda, Helen Margetts, John McDermid, Jane Munga, Arvind Narayanan, Alondra Nelson, Clara Neppel, Alice Oh, Gopal Ramchurn, Stuart Russell, Marietje Schaake, Bernhard Schölkopf, Dawn Song, Alvaro Soto, Lee Tiedrich, Gaël Varoquaux, Andrew Yao, Ya-Qin Zhang, Fahad Albalawi, Marwan Alserkal, Olubunmi Ajala, Guillaume Avrin, Christian Busch, André Carlos Ponce de Leon Ferreira de Carvalho, Bronwyn Fox, Amandeep Singh Gill, Ahmet Halit Hatip, Juha Heikkilä, Gill Jolly, Ziv Katzir, Hiroaki Kitano, Antonio Krüger, Chris Johnson, Saif M. Khan, Kyoung Mu Lee, Dominic Vincent Ligot, Oleksii Molchanovskyi, Andrea Monti, Nusu Mwamanzi, Mona Nemer, Nuria Oliver, José Ramón López Portillo, Balaraman Ravindran, Raquel Pezoa Rivera, Hammam Riza, Crystal Rugege, Ciarán Seoighe, Jerry Sheehan, Haroon Sheikh, Denise Wong, and Yi Zeng. International AI Safety Report, January 2025. URL http://arxiv.org/abs/2501.17805.
- Biggio et al. (2012) Battista Biggio, Blaine Nelson, and Pavel Laskov. Poisoning attacks against support vector machines. In Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 1467–1474, 2012.
- Black (2024) Sarah L. Black. Negotiating peace: Ai systems in diplomacy. Foreign Affairs, 103(4):89–102, 2024.
- Black et al. (2022) Sidney Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, Usvsn Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. Gpt-neox-20b: An open-source autoregressive language model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pp. 95–136. Association for Computational Linguistics, 2022. doi:10.18653/v1/2022.bigscience-1.9.
- Bonatti et al. (2024) Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, and Zack Hui. Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale, September 2024. URL http://arxiv.org/abs/2409.08264.
- Brand et al. (2023) James Brand, Ayelet Israeli, and Donald Ngwe. Using LLMs for Market Research, March 2023. URL https://papers.ssrn.com/abstract=4395751.
- Brundage et al. (2018a) Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Michael C. Horowitz, Gretchen Krueger, and Paul Scharre. The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. ArXiv Preprint ArXiv:1802.07228, 2018a.
- Brundage et al. (2018b) Miles Brundage et al. The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. Technical report, Future of Humanity Institute, University of Oxford, 2018b.
- Buldyrev et al. (2010) Sergey V. Buldyrev, Roni Parshani, Gerald Paul, H. Eugene Stanley, and Shlomo Havlin. Catastrophic cascade of failures in interdependent networks. Nature, 464(7291):1025–1028, 2010.
- Busoniu et al. (2008) Lucian Busoniu, Robert Babuška, Bart De Schutter, and Damien Ernst. A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 38(2):156–172, 2008.
- Cannady (2000) James Cannady. Next generation intrusion detection: Autonomous reinforcement learning of network attacks. 12 2000.
- Chan et al. (2024a) Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung. Visibility into AI Agents. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, pp. 958–973, New York, NY, USA, June 2024a. Association for Computing Machinery. ISBN 9798400704505. doi:10.1145/3630106.3658948. URL https://dl.acm.org/doi/10.1145/3630106.3658948.
- Chan et al. (2024b) Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, and Markus Anderljung. Ids for ai systems, 6 2024b.
- Chan et al. (2024c) Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, and Markus Anderljung. IDs for AI Systems, October 2024c. URL http://arxiv.org/abs/2406.12137.
- Chan et al. (2025) Alan Chan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K. Hadfield, and Markus Anderljung. Infrastructure for AI Agents, January 2025. URL http://arxiv.org/abs/2501.10114.
- Chen & Shu (2024) Canyu Chen and Kai Shu. Combating misinformation in the age of LLMs: Opportunities and challenges. AI Magazine, 45(3):354–368, 2024. ISSN 2371-9621. doi:10.1002/aaai.12188. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/aaai.12188.
- Christiano et al. (2018) Paul Christiano, Buck Shlegeris, and Dario Amodei. Supervising strong learners by amplifying weak experts, 10 2018.
- Cisco (2023) Cisco. What is a distributed denial-of-service (DDoS) attack?, 2023. URL https://www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/.
- Conitzer & Sandholm (2006) Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to in bayesian stackelberg games. In Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2006.
- Costan & Devadas (2016) Victor Costan and Srinivas Devadas. Intel sgx explained. IACR Cryptology ePrint Archive, 2016:86, 2016.
- Davies et al. (2025) Xander Davies, Eric Winsor, Tomek Korbak, Alexandra Souly, Robert Kirk, Christian Schroeder de Witt, and Yarin Gal. Fundamental Limitations in Defending LLM Finetuning APIs, February 2025. URL http://arxiv.org/abs/2502.14828. arXiv:2502.14828 [cs].
- Dawes (1980) Robyn M. Dawes. Social dilemmas. Annual Review of Psychology, 31:169–193, 1980. doi:10.1146/annurev.ps.31.020180.001125.
- DeepSeek-AI et al. (2025) DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Shengfeng Ye, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wanjia Zhao, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, and Zhen Zhang. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, January 2025. URL http://arxiv.org/abs/2501.12948.
- Deng et al. (2023) Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. MIND2WEB: towards a generalist agent for the web. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, pp. 28091–28114, Red Hook, NY, USA, December 2023. Curran Associates Inc.
- Douceur (2002) John R. Douceur. The sybil attack. In Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS), pp. 251–260, 2002.
- Doumbouya et al. (2024) Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia, Davide Ghilardi, Anna Goldie, Federico Bianchi, Dan Jurafsky, and Christopher D. Manning. h4rm3l: A dynamic benchmark of composable jailbreak attacks for llm safety assessment, 2024. URL https://arxiv.org/abs/2408.04811.
- Draguns et al. (2024) Andis Draguns, Andrew Gritsevskiy, Sumeet Ramesh Motwani, and Christian Schroeder de Witt. Unelicitable Backdoors via Cryptographic Transformer Circuits. November 2024. URL https://openreview.net/forum?id=a560KLF3v5.
- Drechsler (2023) Ingrid Drechsler. Ai agents and financial market stability: Hyperswitching and deposit runs. Journal of Financial Stability, 59:100978, 2023. doi:10.1016/j.jfs.2023.100978.
- Ellsberg (1968) Daniel W. Ellsberg. The theory of coercion and extortion. Journal of Political Economy, 76(3):424–431, 1968.
- Epstein & Axtell (1996) Joshua M. Epstein and Robert Axtell. Growing Artificial Societies: Social Science from the Bottom Up. Brookings Institution Press and MIT Press, 1996.
- Falade (2023) Polra Victor Falade. Decoding the threat landscape: Chatgpt, fraudgpt, and wormgpt in social engineering attacks. arXiv preprint arXiv:2310.05595, 2023.
- Foerster et al. (2018) Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nando Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp. 2974–2982, 2018.
- Fowler (2012) Martin Fowler. Circuit breaker. martinfowler.com/bliki/CircuitBreaker.html, 2012.
- Fraboni et al. (2021) Yann Fraboni, Richard Vidal, and Marco Lorenzi. Free-rider attacks on model aggregation in federated learning. In Proceedings of the 2021 International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1846–1854, 2021.
- Franzmeyer et al. (2023) Tim Franzmeyer, Stephen Marcus McAleer, Joao F. Henriques, Jakob Nicolaus Foerster, Philip Torr, Adel Bibi, and Christian Schroeder de Witt. Illusory Attacks: Information-theoretic detectability matters in adversarial attacks. ICLR 2023, October 2023. URL https://openreview.net/forum?id=F5dhGCdyYh.
- Franzmeyer et al. (2024) Tim Franzmeyer, Stephen Marcus McAleer, Joao F. Henriques, Jakob Nicolaus Foerster, Philip Torr, Adel Bibi, and Christian Schroeder de Witt. Illusory attacks: Information-theoretic detectability matters in adversarial attacks. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=F5dhGCdyYh.
- Fu et al. (2024) Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes. Imprompter: Tricking llm agents into improper tool use. In Proceedings of the 2024 IEEE Symposium on Security and Privacy, 2024.
- Gabriel et al. (2024) Iason Gabriel, Arianna Manzini, Geoff Keeling, Lisa Anne Hendricks, Verena Rieser, Hasan Iqbal, Nenad Tomašev, Ira Ktena, Zachary Kenton, Mikel Rodriguez, Seliem El-Sayed, Sasha Brown, Canfer Akbulut, Andrew Trask, Edward Hughes, A. Stevie Bergman, Renee Shelby, Nahema Marchal, Conor Griffin, Juan Mateos-Garcia, Laura Weidinger, Winnie Street, Benjamin Lange, Alex Ingerman, Alison Lentz, Reed Enger, Andrew Barakat, Victoria Krakovna, John Oliver Siy, Zeb Kurth-Nelson, Amanda McCroskery, Vijay Bolina, Harry Law, Murray Shanahan, Lize Alberts, Borja Balle, Sarah de Haas, Yetunde Ibitoye, Allan Dafoe, Beth Goldberg, Sébastien Krier, Alexander Reese, Sims Witherspoon, Will Hawkins, Maribeth Rauh, Don Wallace, Matija Franklin, Josh A. Goldstein, Joel Lehman, Michael Klenk, Shannon Vallor, Courtney Biles, Meredith Ringel Morris, Helen King, Blaise Agüera y Arcas, William Isaac, and James Manyika. The ethics of advanced ai assistants, 4 2024.
- Garg et al. (2025) Divyansh Garg, Shaun VanWeelden, Diego Caples, Andis Draguns, Nikil Ravi, Pranav Putta, Naman Garg, Tomas Abraham, Michael Lara, Federico Lopez, James Liu, Atharva Gundawar, Prannay Hebbar, Youngchul Joo, Jindong Gu, Charles London, Christian Schroeder de Witt, and Sumeet Motwani. REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites, April 2025. URL http://arxiv.org/abs/2504.11543.
- Gentry (2009) Craig Gentry. A fully homomorphic encryption scheme. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pp. 169–178, 2009.
- Gerstein & Leidy (2024) Daniel M. Gerstein and Erin N. Leidy. Emerging Technology and Risk Analysis: Unmanned Aerial Systems Intelligent Swarm Technology. Technical report, RAND Corporation, February 2024. URL https://www.rand.org/pubs/research_reports/RRA2380-1.html.
- Ghallab et al. (1998) Malik Ghallab, Craig Knoblock, David Wilkins, Anthony Barrett, Dave Christianson, Marc Friedman, Chung Kwok, Keith Golden, Scott Penberthy, David Smith, Ying Sun, and Daniel Weld. Pddl - the planning domain definition language. 08 1998.
- Gleave et al. (2019) Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial Policies: Attacking Deep Reinforcement Learning. September 2019. URL https://openreview.net/forum?id=HJgEMpVFwB.
- Gleave et al. (2020) Adam Gleave, Marc Dennis, Calum Wild, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020.
- Goldreich et al. (1987a) Oded Goldreich, Silvio Micali, and Avi Wigderson. How to play any mental game. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, pp. 218–229, 1987a.
- Goldreich et al. (1987b) Oded Goldreich, Silvio Micali, and Avi Wigderson. How to play any mental game. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing (STOC), pp. 218–229. ACM, 1987b.
- Goldwasser et al. (1989) Shafi Goldwasser, Silvio Micali, and Charles Rackoff. The knowledge complexity of interactive proof-systems. SIAM Journal on Computing, 18(1):186–208, 1989.
- Gottweis et al. (2025) Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, Khaled Saab, Dan Popovici, Jacob Blum, Fan Zhang, Katherine Chou, Avinatan Hassidim, Burak Gokturk, Amin Vahdat, Pushmeet Kohli, Yossi Matias, Andrew Carroll, Kavita Kulkarni, Nenad Tomasev, Yuan Guan, Vikram Dhillon, Eeshit Dhaval Vaishnav, Byron Lee, Tiago R. D. Costa, José R. Penadés, Gary Peltz, Yunhan Xu, Annalisa Pawlosky, Alan Karthikesalingam, and Vivek Natarajan. Towards an AI co-scientist, February 2025. URL http://arxiv.org/abs/2502.18864.
- Greenblatt et al. (2023) Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan, and Fabien Roger. Ai control: Improving safety despite intentional subversion, 12 2023.
- Gu et al. (2024) Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, and Min Lin. Agent smith: A single image can jailbreak one million multimodal LLM agents exponentially fast. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp. 16647–16672. PMLR, 21–27 Jul 2024. URL https://proceedings.mlr.press/v235/gu24e.html.
- Guo et al. (2025) Wenbo Guo, Yujin Potter, Tianneng Shi, Zhun Wang, Andy Zhang, and Dawn Song. Frontier AI’s Impact on the Cybersecurity Landscape, April 2025. URL http://arxiv.org/abs/2504.05408.
- Halawi et al. (2024) Danny Halawi, Alexander Wei, Eric Wallace, Tony Tong Wang, Nika Haghtalab, and Jacob Steinhardt. Covert malicious finetuning: Challenges in safeguarding LLM adaptation. In Forty-First International Conference on Machine Learning, 2024. URL https://icml.cc/virtual/2024/poster/34921.
- Halpern & Pass (2014) Joseph Y. Halpern and Rafael Pass. Algorithmic rationality: Game theory with costs of computation. Journal of Artificial Intelligence Research, 50:193–235, 2014.
- Hammar & Stadler (2023) Kim Hammar and Rolf Stadler. Scalable learning of intrusion responses through recursive decomposition. arXiv preprint arXiv:2309.03292, 2023. URL https://arxiv.org/abs/2309.03292.
- Hammond & Adam-Day (2025a) Lewis Hammond and Sam Adam-Day. Neural interactive proofs. In The Thirteenth International Conference on Learning Representations, 2025a. Forthcoming.
- Hammond & Adam-Day (2025b) Lewis Hammond and Sam Adam-Day. Neural Interactive Proofs. International Conference on Learning Representations (ICLR) 2025, October 2025b. URL https://openreview.net/forum?id=R2834dhBlo.
- Hammond et al. (2023) Lewis Hammond, James Fox, Tom Everitt, Ryan Carey, Alessandro Abate, and Michael Wooldridge. Reasoning about Causality in Games. Artificial Intelligence, 320, July 2023. ISSN 00043702. doi:10.1016/j.artint.2023.103919. URL http://arxiv.org/abs/2301.02324.
- Hammond et al. (2025) Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob Foerster, Tomáš Gavenčiak, The Anh Han, Edward Hughes, Vojtěch Kovařík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schroeder de Witt, Nisarg Shah, Michael Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Quentin Feuillade-Montixi, Matija Franklin, Esben Kran, Igor Krawczuk, Max Lamparth, Niklas Lauffer, Alexander Meinke, Sumeet Motwani, Anka Reuel, Vincent Conitzer, Michael Dennis, Iason Gabriel, Adam Gleave, Gillian Hadfield, Nika Haghtalab, Atoosa Kasirzadeh, Sébastien Krier, Kate Larson, Joel Lehman, David C. Parkes, Georgios Piliouras, and Iyad Rahwan. Multi-Agent Risks from Advanced AI, February 2025. URL http://arxiv.org/abs/2502.14143.
- Hao et al. (2024) Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. Training Large Language Models to Reason in a Continuous Latent Space, December 2024. URL http://arxiv.org/abs/2412.06769.
- Hardin (1968) Garrett Hardin. The tragedy of the commons. Science, 162(3859):1243–1248, 1968. doi:10.1126/science.162.3859.1243.
- Harrenstein (2007) Peter Harrenstein. Commitment and trust in multi-agent systems. ACM Transactions on Autonomous and Adaptive Systems, 2(1):4:1–4:30, 2007.
- Hasan et al. (2024) Syed Mhamudul Hasan, Alaa M. Alotaibi, Sajedul Talukder, and Abdur R. Shahid. Distributed threat intelligence at the edge devices: A large language model-driven approach. In IEEE 48th Annual Computers, Software, and Applications Conference, pp. 1496–1497. IEEE, 7 2024. doi:10.1109/compsac61105.2024.00206.
- Havrylov & Titov (2017) Slobodan Havrylov and Ivan Titov. Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. In Advances in Neural Information Processing Systems, volume 30, pp. 2149–2159, 2017.
- He et al. (2024) Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, and Hao Chen. Security of ai agents, 2024.
- Henry & Du Plessis (2023) Steve Henry and Kirk Du Plessis. Financial history, 2023. URL https://optionalpha.com/topics/financial-history.
- Horowitz (2019a) Michael C. Horowitz. Artificial intelligence and the future of warfare. International Security, 43(4):115–153, 2019a.
- Horowitz (2019b) Michael C. Horowitz. Artificial intelligence, international competition, and the balance of power. Texas National Security Review, 2(2):36–57, 2019b.
- Horowitz (2021) Michael C. Horowitz. Lethal autonomous weapons and u.s. security. Journal of Strategic Studies, 44(2):191–205, 2021.
- Huang et al. (2024) Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Maarten Sap, and Michael R. Lyu. On the resilience of llm-based multi-agent collaboration with faulty agents. arXiv:2408.00989, August 2024. doi:10.48550/ARXIV.2408.00989.
- Humphreys et al. (2022) Peter C. Humphreys, David Raposo, Tobias Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Adam Santoro, and Timothy Lillicrap. A data-driven approach for learning to control computers. In Proceedings of the 39th International Conference on Machine Learning, pp. 9466–9482. PMLR, June 2022. URL https://proceedings.mlr.press/v162/humphreys22a.html. ISSN: 2640-3498.
- Institute (2021) Blair Institute. Social Media Futures: What Is Brigading?, 2021. URL https://www.institute.global/insights/tech-and-digitalisation/social-media-futures-what-brigading.
- Irving et al. (2018) Geoffrey Irving, Paul Christiano, and Dario Amodei. Ai safety via debate, 5 2018.
- Islam et al. (2012) Milad Islam, Tooba Khan, and Sultan Khan. Parallel inference attacks on distributed information systems. In Proceedings of the 2012 Network and Distributed System Security Symposium (NDSS), 2012. URL https://www.ndss-symposium.org/ndss2012/programme/inference-attacks.
- Jervis (2017) Robert Jervis. Perception and Misperception in International Politics. Princeton University Press, 2017.
- Johnson (2004) Neil F. Johnson. Optimizing intelligence: Ai in conflict resolution. Journal of Conflict Resolution, 48(5):637–661, 2004.
- Johnson (2020) Neil F. Johnson. Military command and control: Ai escalation risks. Journal of Military Ethics, 19(1):45–61, 2020.
- Johnson (2021) Neil F. Johnson. Artificial intelligence and military unintended escalation. Defense Studies, 21(3):208–229, 2021.
- Jones et al. (2024) Erik Jones, Anca Dragan, and Jacob Steinhardt. Adversaries can misuse combinations of safe models. arXiv:2406.14595, June 2024. doi:10.48550/ARXIV.2406.14595.
- Ju et al. (2024) Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, and Gongshen Liu. Flooding spread of manipulated knowledge in llm-based multi-agent communities. arXiv:2407.07791, July 2024. doi:10.48550/ARXIV.2407.07791.
- Kairouz et al. (2021a) Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun N. Bhagoji, et al. Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1–2):1–210, 2021a.
- Kairouz et al. (2021b) Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, and Rachel Cummings. Advances and open problems in federated learning. Foundations and trends® in machine learning, 14(1–2):1–210, 2021b. URL https://www.nowpublishers.com/article/Details/MAL-083. Publisher: Now Publishers, Inc.
- Kauffman (1993a) Stuart A. Kauffman. The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press, 1993a.
- Kauffman (1993b) Stuart A. Kauffman. The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press, 1993b.
- Khlaaf (2023) Heidy Khlaaf. Toward comprehensive risk assessments and assurance of ai-based systems, 2023.
- Kirilenko et al. (2017) Andrei A. Kirilenko, Albert S. Kyle, Mehrdad Samadi, and Tugkan Tuzun. The flash crash: High-frequency trading in an electronic market. The Journal of Finance, 72(3):967–998, 2017. ISSN 00221082, 15406261. doi:10.1111/jofi.12498.
- Kitano (2004) Hiroaki Kitano. Biological robustness. Nature Reviews Genetics, 5(11):826–837, 2004.
- Knack & Burke (2024) Anna Knack and Ant Burke. Autonomous Cyber Defence Phase II. 2024. URL https://cetas.turing.ac.uk/publications/autonomous-cyber-defence-autonomous-agents.
- Kreutz et al. (2015) Diego Kreutz, Fernando M. V. Ramos, Paulo Esteves Veríssimo, Christian Esteve Rothenberg, Siamak Azodolmolky, and Steve Uhlig. Software-Defined Networking: A Comprehensive Survey. Proceedings of the IEEE, 103(1):14–76, January 2015. ISSN 1558-2256. doi:10.1109/JPROC.2014.2371999. URL https://ieeexplore.ieee.org/document/6994333.
- Laird (2020) John E. Laird. Risks of autonomous decisions in military ai systems. AI & Society, 35(4):997–1008, 2020.
- Lamparth et al. (2024) Max Lamparth, Anthony Corso, Jacob Ganz, Oriana Skylar Mastro, Jacquelyn Schneider, and Harold Trinkunas. Human vs. machine: Behavioral differences between expert humans and language models in wargame simulations, 2024.
- Lamport et al. (1982a) Leslie Lamport, Robert Shostak, and Marshall Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401, July 1982a. ISSN 1558-4593. doi:10.1145/357172.357176.
- Lamport et al. (1982b) Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst., 4(3):382–401, July 1982b. ISSN 0164-0925. doi:10.1145/357172.357176. URL https://dl.acm.org/doi/10.1145/357172.357176.
- Langton (1990a) Christopher G. Langton. Computation at the edge of chaos: Phase transitions and emergent computation. Physica D: Nonlinear Phenomena, 42(1–3):12–37, 1990a.
- Langton (1990b) Christopher G. Langton. Computation at the edge of chaos: Phase transitions and emergent computation. Physica D: Nonlinear Phenomena, 42(1-3):12–37, 1990b.
- Lazaridou et al. (2016) Angeliki Lazaridou, Nhung Pham, and Marco Baroni. Multi-agent cooperation and the emergence of (natural) language. In International Conference on Learning Representations, 2016.
- Lee & Tiwari (2024) Donghyun Lee and Mo Tiwari. Prompt infection: Llm-to-llm prompt injection within multi-agent systems. arXiv:2410.07283, October 2024. doi:10.48550/ARXIV.2410.07283.
- Leike et al. (2018) Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. Scalable agent alignment via reward modeling: A research direction, 11 2018.
- Lerer & Peysakhovich (2017) Adam Lerer and Alexander Peysakhovich. Maintaining cooperation in complex social dilemmas using deep reinforcement learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017. URL https://arxiv.org/abs/1707.01068.
- Li et al. (2023) Simin Li, Jun Guo, Jingqiao Xiu, Ruixiao Xu, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, and Xianglong Liu. Byzantine Robust Cooperative Multi-Agent Reinforcement Learning as a Bayesian Game. October 2023. URL https://openreview.net/forum?id=z6KS9D1dxt.
- Li et al. (2024) Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, Rui Kong, Yile Wang, Hanfei Geng, Jian Luan, Xuefeng Jin, Zilong Ye, Guanjing Xiong, Fan Zhang, Xiang Li, Mengwei Xu, Zhijun Li, Peng Li, Yang Liu, Ya-Qin Zhang, and Yunxin Liu. Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security, May 2024. URL http://arxiv.org/abs/2401.05459.
- Liao et al. (2013) Hung-Jen Liao, Chun-Hung Richard Lin, Ying-Chih Lin, and Kuang-Yuan Tung. Intrusion detection system: A comprehensive review. Journal of Network and Computer Applications, 36(1):16–24, January 2013. ISSN 1084-8045. doi:10.1016/j.jnca.2012.09.004. URL https://www.sciencedirect.com/science/article/pii/S1084804512001944.
- Lowe et al. (2017) Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pp. 6379–6390, 2017.
- Lu et al. (2022) Chris Lu, Timon Willi, Christian Schroeder de Witt, and Jakob Nicolaus Foerster. Model-Free Opponent Shaping. April 2022. URL https://openreview.net/forum?id=Bfg_sqypl5.
- Lyu et al. (2021) Lingjuan Lyu, Jiangshan Yu, Karthik Nandakumar, and Kee Siong Ng. Free-rider attacks on model aggregation in federated learning. In Proceedings of the 2021 International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1846–1854, 2021.
- Lù et al. (2024) Xing Han Lù, Zdeněk Kasner, and Siva Reddy. WEBLINX: real-world website navigation with multi-turn dialogue. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of ICML’24, pp. 33007–33056, Vienna, Austria, July 2024. JMLR.org.
- Manson (2023) Tim Manson. Pentagon poised to deploy ai advisors ‘in the very near term’. https://www.defense.gov/News/News-Stories/Article/Article/3499638/, 2023.
- Manson (2024) Tim Manson. Ai advisors and negotiators in high-stakes military decisions. International Security, 48(1):123–145, 2024.
- Marro et al. (2024) Samuele Marro, Emanuele La Malfa, Jesse Wright, Guohao Li, Nigel Shadbolt, Michael Wooldridge, and Philip Torr. A scalable communication protocol for networks of large language models. arXiv:2410.11905, October 2024. doi:10.48550/ARXIV.2410.11905.
- McMahan et al. (2017) Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 1273–1282. PMLR, April 2017. URL https://proceedings.mlr.press/v54/mcmahan17a.html.
- Mei et al. (2024) Kai Mei, Xi Zhu, Wujiang Xu, Wenyue Hua, Mingyu Jin, Zelong Li, Shuyuan Xu, Ruosong Ye, Yingqiang Ge, and Yongfeng Zhang. AIOS: LLM Agent Operating System, November 2024. URL http://arxiv.org/abs/2403.16971.
- Meta (2025) Meta. LlamaFirewall: An open source guardrail system for building secure AI agents | Research - AI at Meta, 2025. URL https://ai.meta.com/research/publications/llamafirewall-an-open-source-guardrail-system-for-building-secure-ai-agents/.
- Montañez et al. (2020) Rosana Montañez, Edward Golob, and Shouhuai Xu. Human Cognition Through the Lens of Social Engineering Cyberattacks. Frontiers in Psychology, 11:1755, 2020. ISSN 1664-1078. doi:10.3389/fpsyg.2020.01755.
- Motter & Lai (2002) Adilson E. Motter and Ying-Cheng Lai. Cascade-based attacks on complex networks. Physical Review E, 66(6):065102, December 2002. ISSN 1063-651X, 1095-3787. doi:10.1103/PhysRevE.66.065102. URL http://arxiv.org/abs/cond-mat/0301086. arXiv:cond-mat/0301086.
- Motwani et al. (2024a) Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip Torr, Lewis Hammond, and Christian Schroeder de Witt. Secret collusion among AI agents: Multi-agent deception via steganography. In The Thirty-Eighth Annual Conference on Neural Information Processing Systems, 11 2024a. URL https://openreview.net/forum?id=bnNSQhZJ88.
- Motwani et al. (2024b) Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip Torr, Lewis Hammond, and Christian Schroeder de Witt. Secret Collusion among AI Agents: Multi-Agent Deception via Steganography. November 2024b. URL https://openreview.net/forum?id=bnNSQhZJ88.
- Motwani et al. (2024c) Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, Markian Rybchuk, Philip H. S. Torr, Ivan Laptev, Fabio Pizzati, Ronald Clark, and Christian Schroeder de Witt. MALT: Improving Reasoning with Multi-Agent LLM Training, December 2024c. URL http://arxiv.org/abs/2412.01928.
- Myerson (1981) Roger B Myerson. Optimal auction design. Mathematics of Operations Research, 6(1):58–73, 1981.
- Nakano et al. (2022) Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. WebGPT: Browser-assisted question-answering with human feedback, June 2022. URL http://arxiv.org/abs/2112.09332.
- Naor (1991) Moni Naor. Bit commitment using pseudorandomness. In Advances in Cryptology – CRYPTO ’91, pp. 128–140, 1991.
- NCSC (2024) NCSC. Sboms and the importance of inventory, 2024. URL https://www.ncsc.gov.uk/blog-post/sboms-and-the-importance-of-inventory.
- NETSCOUT Arbor (2024) NETSCOUT Arbor. Ddos threat intelligence report, 2h 2024. Technical report, NETSCOUT Systems, Inc., 2024. URL https://www.netscout.com/threatreport.
- Newman (2018a) Mark Newman. Networks: An Introduction. Oxford University Press, 2018a.
- Newman (2018b) Mark Newman. Networks: An Introduction. Oxford University Press, 2018b.
- Nie et al. (2024) Yuzhou Nie, Zhun Wang, Ye Yu, Xian Wu, Xuandong Zhao, Wenbo Guo, and Dawn Song. Privagent: Agentic-based red-teaming for llm privacy leakage. arXiv preprint arXiv:2412.05734, 2024.
- Nisan & Ronen (2001) Noam Nisan and Amir Ronen. Algorithmic mechanism design. Games and Economic Behavior, 35(1-2):166–196, 2001.
- Omidshafiei et al. (2019) Sherief Omidshafiei, Joel Pazis, Christopher Amato, Jonathan P. How, and Vianney Perchet. A unified game-theoretic framework for multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning (ICML), pp. 5153–5162, 2019.
- Ostrom (1990) Elinor Ostrom. Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge University Press, Cambridge [England]; New York, 01 1990. ISBN 0-521-37101-5.
- Palantir Technologies (2023) Palantir Technologies. Ai for defense: Leveraging llms in military planning. https://www.palantir.com/aip-defense, 2023.
- Panda et al. (2024) Ashwinee Panda, Christopher A. Choquette-Choo, Zhengming Zhang, Yaoqing Yang, and Prateek Mittal. Teach llms to phish: Stealing private information from language models. arXiv preprint arXiv:2403.00871, 2024.
- Paruchuri et al. (2008) Vibhav Paruchuri, John P. Pearce, Francesca Ordóñez, Milind Tambe, and Sarit Kraus. Playing games for security: An efficient exact algorithm for solving bayesian stackelberg games. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), 2008.
- Pasquale et al. (2024) Giulio De Pasquale, Ilya Grishchenko, Riccardo Iesari, Gabriel Pizarro, Lorenzo Cavallaro, Christopher Kruegel, and Giovanni Vigna. {ChainReactor}: Automated Privilege Escalation Chain Discovery via {AI} Planning. pp. 5913–5929, 2024. ISBN 978-1-939133-44-1. URL https://www.usenix.org/conference/usenixsecurity24/presentation/de-pasquale.
- Pastor-Galindo et al. (2024) Javier Pastor-Galindo, Pantaleone Nespoli, and José A. Ruipérez-Valiente. Large-Language-Model-Powered Agent-Based Framework for Misinformation and Disinformation Research: Opportunities and Open Challenges. IEEE Secur. Privacy, 22(3):24–36, May 2024. ISSN 1540-7993, 1558-4046. doi:10.1109/MSEC.2024.3380511. URL http://arxiv.org/abs/2310.07545.
- Pastor-Satorras & Vespignani (2001a) Romualdo Pastor-Satorras and Alessandro Vespignani. Epidemic spreading in scale-free networks. Physical Review Letters, 86(14):3200–3203, 2001a.
- Pastor-Satorras & Vespignani (2001b) Romualdo Pastor-Satorras and Alessandro Vespignani. Epidemic spreading in scale-free networks. Physical Review Letters, 86(14):3200–3203, 2001b.
- Pavlova et al. (2024) Maya Pavlova, Erik Brinkman, Krithika Iyer, Vitor Albiero, Joanna Bitton, Hailey Nguyen, Joe Li, Cristian Canton Ferrer, Ivan Evtimov, and Aaron Grattafiori. Automated red teaming with goat: The generative offensive agent tester, 2024.
- Peigné et al. (2025) Pierre Peigné, Mikolaj Kniejski, Filip Sondej, Matthieu David, Jason Hoelscher-Obermaier, Christian Schroeder de Witt, and Esben Kran. Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems. Proceedings of the AAAI Conference on Artificial Intelligence, 39(26), April 2025. ISSN 2374-3468. doi:10.1609/aaai.v39i26.34970. URL https://ojs.aaai.org/index.php/AAAI/article/view/34970.
- Perez et al. (2022) Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. Red teaming language models with language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 3419–3448, 2022. doi:10.18653/v1/2022.emnlp-main.225.
- Pita et al. (2008) Justin Pita, Manish Jain, Francesca Ordóñez, Cynthia Portway, Milind Tambe, Cynthia Western, John Orlosky, Vibhav Paruchuri, and Sarit Kraus. Deployed armor protection: The los angeles airport security game. In Proceedings of the 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2008.
- Poslad (2002) Andrew Poslad. Ubiquitous Computing: Smart Devices, Environments and Interactions. Wiley, 2002.
- Putta et al. (2024) Pranav Putta, Edmund Mills, Naman Garg, Sumeet Motwani, Chelsea Finn, Divyansh Garg, and Rafael Rafailov. Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents, August 2024. URL http://arxiv.org/abs/2408.07199.
- (153) Cheng Qian, Emre Can Acikgoz, Hongru Wang, Xiusi Chen, Avirup Sil, Dilek Hakkani-Tur, Gokhan Tur, and Heng Ji. SMART: Self-Aware Agent for Tool Overuse Mitigation. URL http://arxiv.org/abs/2502.11435.
- Ren et al. (2023) Kezhou Ren, Yifan Zeng, Yuanfu Zhong, Biao Sheng, and Yingchao Zhang. Mafsids: a reinforcement learning-based intrusion detection model for multi-agent feature selection networks. Journal of Big Data, 10:137, 2023. URL https://journalofbigdata.springeropen.com/articles/10.1186/s40537-023-00814-4.
- Rivera et al. (2024) Juan-Pablo Rivera, Gabriel Mukobi, Anka Reuel, Max Lamparth, Chandler Smith, and Jacquelyn Schneider. Escalation risks from language models in military and diplomatic decision-making. In The 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, pp. 836–898. ACM, 6 2024. doi:10.1145/3630106.3658942.
- Rose et al. (2020) Scott Rose, Oliver Borchert, Stu Mitchell, and Sean Connelly. Zero trust architecture. NIST Special Publication 800-207, 2020.
- Roy et al. (2024) Devjeet Roy, Xuchao Zhang, Rashi Bhave, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, and Saravan Rajmohan. Exploring LLM-based Agents for Root Cause Analysis, March 2024. URL http://arxiv.org/abs/2403.04123.
- Russell & Norvig (2021) Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Pearson, 4th edition, 2021.
- Sadeghi & Arvanitis (2023) McKenzie Sadeghi and Lorenzo Arvanitis. Rise of the newsbots: Ai-generated news websites proliferating online. NewsGuard, 5 2023. URL https://www.newsguardtech.com/special-reports/newsbots-ai-generated-news-websites-proliferating/.
- Sangwan et al. (2023) Raghvinder S. Sangwan, Youakim Badr, and Satish M. Srinivasan. Cybersecurity for AI Systems: A Survey. Journal of Cybersecurity and Privacy, 3(2):166–190, June 2023. ISSN 2624-800X. doi:10.3390/jcp3020010. URL https://www.mdpi.com/2624-800X/3/2/10. Number: 2 Publisher: Multidisciplinary Digital Publishing Institute.
- Schmidgall et al. (2025) Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Zicheng Liu, and Emad Barsoum. Agent Laboratory: Using LLM Agents as Research Assistants, January 2025. URL http://arxiv.org/abs/2501.04227.
- Schmitt & Flechais (2023) Marc Schmitt and Ivan Flechais. Digital deception: Generative artificial intelligence in social engineering and phishing. arXiv preprint arXiv:2310.13715, 2023.
- Schneier (2012) Bruce Schneier. Liars and Outliers: Enabling the Trust that Society Needs to Thrive. Wiley, Somerset, 1st edition edition, February 2012. ISBN 978-1-118-14330-8.
- Schneier (2018) Bruce Schneier. Artificial Intelligence and the Attack/Defense Balance. IEEE Security & Privacy, 16(2), March 2018. ISSN 1558-4046. doi:10.1109/MSP.2018.1870857. URL https://ieeexplore.ieee.org/document/8328965.
- Schroeder de Witt et al. (2023) Christian Schroeder de Witt, Hawra Milani, Klaudia Krawiecka, Swapneel Mehta, Carla Cremer, and Martin Strohmeier. Multi-Agent Security Workshop at NeurIPS 2023, 2023. URL https://neurips.cc/virtual/2023/workshop/66520.
- Schroeder de Witt et al. (2023a) Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Nicolaus Foerster, and Martin Strohmeier. Perfectly secure steganography using minimum entropy coupling. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=HQ67mj5rJdR.
- Schroeder de Witt et al. (2023b) Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Nicolaus Foerster, and Martin Strohmeier. Perfectly Secure Steganography Using Minimum Entropy Coupling. September 2023b. URL https://openreview.net/forum?id=HQ67mj5rJdR.
- Schroeder de Witt et al. (2024) Christian Schroeder de Witt, srm, MikhailB, Lewis Hammond, chansmi, and sofmonk. Secret Collusion: Will We Know When to Unplug AI? September 2024. URL https://www.lesswrong.com/posts/smMdYezaC8vuiLjCf/secret-collusion-will-we-know-when-to-unplug-ai.
- Schulz et al. (2023) Lion Schulz, Nitay Alon, Jeffrey S. Rosenschein, and Peter Dayan. Emergent deception and skepticism via theory of mind. In Proceedings of the Theory of Mind Workshop at AAAI, 2023.
- Security.com Threat Intelligence Team (2025) Security.com Threat Intelligence Team. Ai: Advent of agents opens new possibilities for attackers. https://www.security.com/blogs/threat-intelligence/ai-agent-attacks, April 2025.
- Servin & Kudenko (2008) Arturo Servin and Daniel Kudenko. Multi-Agent Reinforcement Learning for Intrusion Detection: A Case Study and Evaluation. In Ralph Bergmann, Gabriela Lindemann, Stefan Kirn, and Michal Pěchouček (eds.), Multiagent System Technologies, Berlin, Heidelberg, 2008. Springer. ISBN 978-3-540-87805-6. doi:10.1007/978-3-540-87805-6_15.
- Shapley (1953) Lloyd S. Shapley. A value for n-person games. Contributions to the Theory of Games, 2:307–317, 1953.
- Shearer et al. (2023) Megan J. Shearer, Gabriel Rauterberg, and Michael P. Wellman. Learning to manipulate a financial benchmark. pp. 592–600, Brooklyn, 2023. doi:10.1145/3604237.3626847.
- Shevlane et al. (2023) Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, and Allan Dafoe. Model evaluation for extreme risks, 5 2023.
- Shi et al. (2017) Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, and Percy Liang. World of Bits: An Open-Domain Platform for Web-Based Agents. In Proceedings of the 34th International Conference on Machine Learning, pp. 3135–3144. PMLR, July 2017. URL https://proceedings.mlr.press/v70/shi17a.html. ISSN: 2640-3498.
- Shrivastava et al. (2024) Aryan Shrivastava, Jessica Hullman, and Max Lamparth. Measuring free-form decision-making inconsistency of language models in military crisis simulations, 2024.
- Singer (2009) P. W. Singer. Wired for War: The Robotics Revolution and Conflict in the 21st Century. Penguin, 2009.
- Skopik & Pahi (2020a) Florian Skopik and Timea Pahi. Under false flag: Using technical artifacts for cyber attack attribution. Cybersecurity, 3(1):8, 2020a.
- Skopik & Pahi (2020b) Florian Skopik and Timea Pahi. Under false flag: using technical artifacts for cyber attack attribution. Cybersecurity, 3(1):8, March 2020b. ISSN 2523-3246. doi:10.1186/s42400-020-00048-4. URL https://doi.org/10.1186/s42400-020-00048-4.
- South et al. (2025) Tobin South, Samuele Marro, Thomas Hardjono, Robert Mahari, Cedric Deslandes Whitney, Dazza Greenwood, Alan Chan, and Alex Pentland. Authenticated Delegation and Authorized AI Agents, January 2025. URL http://arxiv.org/abs/2501.09674.
- Stadler & Troncoso (2022) Theresa Stadler and Carmela Troncoso. Why the search for a privacy-preserving data sharing mechanism is failing. Nature Computational Science, 2(4):208–210, April 2022. ISSN 2662-8457. doi:10.1038/s43588-022-00236-x. URL https://www.nature.com/articles/s43588-022-00236-x. Number: 4 Publisher: Nature Publishing Group.
- Stymne (2022) Jakob Stymne. Self-play reinforcement learning for finding intrusion prevention strategies. Master’s thesis, KTH Royal Institute of Technology, 2022. URL https://kth.diva-portal.org/smash/get/diva2:1736915/FULLTEXT01.pdf.
- Su et al. (2024) Yu Su, Diyi Yang, Shunyu Yao, and Tao Yu. Language Agents: Foundations, Prospects, and Risks. In Jessy Li and Fei Liu (eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, pp. 17–24, Miami, Florida, USA, November 2024. Association for Computational Linguistics. doi:10.18653/v1/2024.emnlp-tutorials.3. URL https://aclanthology.org/2024.emnlp-tutorials.3/.
- Sun et al. (2024) Haochen Sun, Jason Li, and Hongyang Zhang. zkLLM: Zero Knowledge Proofs for Large Language Models. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, CCS ’24, pp. 4405–4419, New York, NY, USA, December 2024. Association for Computing Machinery. ISBN 9798400706363. doi:10.1145/3658644.3670334. URL https://dl.acm.org/doi/10.1145/3658644.3670334.
- Sun et al. (2023) Xinyuan Sun, Davide Crapis, Matt Stephenson, Barnabé Monnot, Thomas Thiery, and Jonathan Passerat-Palmbach. Cooperative AI via Decentralized Commitment Devices, November 2023. URL http://arxiv.org/abs/2311.07815.
- Surapeneni et al. (2025) Rao Surapeneni, Miku Jhu, Michael Vakoc, and Todd Segal. google/A2A, May 2025. URL https://github.com/google/A2A. original-date: 2025-03-25T18:44:21Z.
- Sutton & Samavi (2018) Andrew Sutton and Reza Samavi. Tamper-proof privacy auditing for artificial intelligence systems. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-2018, pp. 5374–5378. International Joint Conferences on Artificial Intelligence Organization, 7 2018. doi:10.24963/ijcai.2018/756.
- Szegedy et al. (2014) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.
- Tambe (2011) Milind Tambe. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press, 2011.
- Tong et al. (2019) Liang Tong, Aron Laszka, Chao Yan, Ning Zhang, and Yevgeniy Vorobeychik. Finding needles in a moving haystack: Prioritizing alerts with adversarial reinforcement learning. In Proceedings of the IEEE Symposium on Security and Privacy, 2019. URL https://arxiv.org/abs/1906.08805.
- Turlay (2022) Emmanuel Turlay. What is lineage tracking in machine learning and why you need it, 2022. URL https://www.sematic.dev/blog/what-is-lineage-tracking-in-machine-learning-and-why-you-need-it.
- U.S. Commodity Futures Trading Commission & U.S. Securities & Exchange Commission (2010) U.S. Commodity Futures Trading Commission and U.S. Securities & Exchange Commission. Findings regarding the market events of may 6, 2010. Technical report, 9 2010. URL https://www.sec.gov/files/marketevents-report.pdf.
- U.S. Department of Defense (2018) U.S. Department of Defense. Summary of the 2018 department of defense artificial intelligence strategy, February 2018. Washington, DC.
- Van Loo (2019) Rory Van Loo. Hyperswitching and platform competition: The effects of costless consumer switching in digital markets. Journal of Competition Law & Economics, 15(2):243–267, 2019. doi:10.1093/joclec/nhz006.
- Vegesna (2023) Vinod Varma Vegesna. Privacy-preserving techniques in ai-powered cyber security: Challenges and opportunities. International Journal of Machine Learning for Sustainable Development, 5(4):1–8, 2023. URL https://www.ijsdcs.com/index.php/IJMLSD/article/view/408.
- Vestad & Yang (2024) Arnstein Vestad and Bian Yang. A survey of agent-based modeling for cybersecurity. In Human Factors in Cybersecurity, volume 127, pp. 83–93. AHFE Open Acces, 2024. ISBN 978-1-964867-03-8. doi:10.54941/ahfe1004768.
- Vosoughi et al. (2018) Soroush Vosoughi, Deb Roy, and Sinan Aral. The spread of true and false news online. Science, 359(6380):1146–1151, 2018.
- Wang & Wellman (2020) Xintong Wang and Michael P. Wellman. Market manipulation: An adversarial learning framework for detection and evasion. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp. 4626–4632, 2020. doi:10.24963/ijcai.2020/638.
- Wei et al. (2023) Alexander Wei, Nika Haghtalab, and Jacob Steinhardt. Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483, 2023.
- Wei et al. (2022) Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, pp. 24824–24837, Red Hook, NY, USA, November 2022. Curran Associates Inc. ISBN 978-1-7138-7108-8.
- Wei & Liu (2024) Wenqi Wei and Ling Liu. Trustworthy Distributed AI Systems: Robustness, Privacy, and Governance. ACM Comput. Surv., February 2024. ISSN 0360-0300. doi:10.1145/3645102. URL https://dl.acm.org/doi/10.1145/3645102.
- Wooldridge & Jennings (1995) Michael Wooldridge and Nicholas R. Jennings. Intelligent agents: Theory and practice. The Knowledge Engineering Review, 10(2):115–152, 1995.
- Wu et al. (2024) Feng Wu, Lei Cui, Shaowen Yao, and Shui Yu. Inference Attacks: A Taxonomy, Survey, and Promising Directions, June 2024. URL http://arxiv.org/abs/2406.02027.
- Wylde (2021) Allison Wylde. Zero trust: Never trust, always verify. In 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), pp. 1–4, June 2021. doi:10.1109/CyberSA52016.2021.9478244. URL https://ieeexplore.ieee.org/abstract/document/9478244.
- Wölflein et al. (2025) Georg Wölflein, Dyke Ferber, Daniel Truhn, Ognjen Arandjelović, and Jakob Nikolas Kather. LLM Agents Making Agent Tools, February 2025. URL http://arxiv.org/abs/2502.11705.
- Xiao et al. (2025) Yijia Xiao, Edward Sun, Di Luo, and Wei Wang. TradingAgents: Multi-Agents LLM Financial Trading Framework, April 2025. URL http://arxiv.org/abs/2412.20138.
- Xu et al. (2024a) Jiacen Xu, Jack W. Stokes, Geoff McDonald, Xuesong Bai, David Marshall, Siyue Wang, Adith Swaminathan, and Zhou Li. AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks, March 2024a. URL http://arxiv.org/abs/2403.01038. arXiv:2403.01038 [cs].
- Xu & Xie (2005) Xin Xu and Tao Xie. A Reinforcement Learning Approach for Host-Based Intrusion Detection Using Sequences of System Calls. In De-Shuang Huang, Xiao-Ping Zhang, and Guang-Bin Huang (eds.), Advances in Intelligent Computing, Berlin, Heidelberg, 2005. Springer. ISBN 978-3-540-31902-3. doi:10.1007/11538059_103.
- Xu et al. (2024b) Zihao Xu, Yi Liu, Gelei Deng, Yuekang Li, and Stjepan Picek. A comprehensive study of jailbreak attack versus defense for large language models. pp. 7432–7449, 01 2024b. doi:10.18653/v1/2024.findings-acl.443.
- Xue et al. (2025) Tianci Xue, Weijian Qi, Tianneng Shi, Chan Hee Song, Boyu Gou, Dawn Song, Huan Sun, and Yu Su. An Illusion of Progress? Assessing the Current State of Web Agents, April 2025. URL http://arxiv.org/abs/2504.01382.
- Yamin (2021) Muhammad Yamin. Universal and targeted adversarial attacks on large language models. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
- Yao (1982a) Andrew C. Yao. Protocols for secure computations. In 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982), pp. 160–164, November 1982a. doi:10.1109/SFCS.1982.38. URL https://ieeexplore.ieee.org/document/4568388. ISSN: 0272-5428.
- Yao (1982b) Andrew C.-C. Yao. Protocols for secure computations. In 23rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 160–164, 1982b.
- Yao (1986) Andrew Chi-Chih Yao. How to generate and exchange secrets. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986), pp. 162–167, October 1986. doi:10.1109/SFCS.1986.25. URL https://ieeexplore.ieee.org/document/4568207.
- Yao et al. (2023) Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing Reasoning and Acting in Language Models, March 2023. URL http://arxiv.org/abs/2210.03629.
- Zhou et al. (2023) Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. WebArena: A Realistic Web Environment for Building Autonomous Agents. October 2023. URL https://openreview.net/forum?id=oKn9c6ytLx.
- Zou et al. (2023) Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and Transferable Adversarial Attacks on Aligned Language Models, December 2023. URL http://arxiv.org/abs/2307.15043. arXiv:2307.15043 [cs].
- Zou (2023) Yinpeng Zou. Universal transferable adversarial attacks on neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.