Kale-ab Tessera

Open to research internships from July 2026 onwards in LLM agents, multi-agent systems, open-endedness, cooperative AI, and reinforcement learning. kaleabtessera@gmail.com · resumé

I am a third-year PhD candidate at the University of Edinburgh studying how foundation-model agents reason, coordinate, and fail in dynamic, open-ended multi-agent environments. My work sits at the intersection of reinforcement learning, multi-agent systems, and agentic models, with a focus on long-horizon interaction, coordination, robustness, and failure modes.

More broadly, I study a core question: as agents are deployed in increasingly open-ended settings, when and why does cooperation break down, and how can we make it robust?

I am advised by Amos Storkey, Tim Rocktäschel (UCL), and Aris Filos-Ratsikas, and I am affiliated with MARBLE (Multi-Agent, Reinforcement, Behaviour and Learning), where I co-organise the 🤖 RL & Agents Reading Group.

Before starting my PhD, I spent 2.5 years as a Research Engineer on the MARL team at InstaDeep, alongside broader experience in machine learning and software engineering. My background combines research on multi-agent reinforcement learning with current work on LLM agents in open-ended settings.

Recent work includes HyperMARL, accepted at NeurIPS 2025, which studies adaptive cooperation in MARL, and Probing Dec-POMDP Reasoning, accepted as an Oral at AAMAS 2026, which develops information-theoretic tools for testing whether multi-agent benchmarks require genuine decentralised reasoning.

Research Interests:

LLM agents: evaluating coordination, reasoning, and robustness in open-ended multi-agent settings.
Reinforcement learning for agents: training and evaluating adaptive policies for long-horizon interaction, tool use, and open-ended environments.
Multi-agent learning and cooperation: understanding when tasks require genuine decentralised reasoning and when coordination breaks down.

For more information, see my resumé and Google Scholar.

news

May, 2026	🌟 Presenting Probing Dec-POMDP Reasoning in Cooperative MARL at AAMAS 2026 in Paphos, Cyprus.
Dec, 2025	🌟 Presented HyperMARL: Adaptive Hypernetworks for Multi-Agent RL at NeurIPS 2025 in San Diego, US.
Sep, 2025	🗣️ Talk on “Algorithms and Benchmarks for Robust Multi-Agent Coordination” at the RAIL Lab, University of the Witwatersrand.
Aug, 2025	🏅 Remembering the Markov Property in Cooperative MARL won best poster (1st place) out of 278 submissions at the Deep Learning Indaba in Kigali, Rwanda.
Aug, 2025	📅 Co-Programme Chair for the Deep Learning Indaba and Head of Practicals and Tutorials in Kigali, Rwanda.
Aug, 2025	Our reading group is back – 🤖 RL & Agents Reading Group.
Aug, 2025	🌟 Presented Remembering the Markov Property in Cooperative MARL and HyperMARL: Adaptive Hypernetworks for Multi-Agent RL at RLC workshops in Edmonton, Canada.
Mar, 2025	🌟 Attended UK Multi-Agent Systems Symposium 2025 at King’s College London.
Aug, 2024	🗣️ Taught “Introduction to ML” at DLI.
Jul, 2024	🏅 Awarded a scholarship to attend the CIFAR Deep Learning and Reinforcement Learning (DLRL) Summer School in Toronto, Canada.
Jan, 2024	🗣️ Begin co-hosting the UOE RL reading group, YouTube.
Sep, 2023	🎓 Started my PhD at the University of Edinburgh (UOE), through the Informatics Global PhD Scholarship.
Aug, 2023	🛠️ PC member and Practicals Chair of DLI - notebooks 2023, RL Prac.
May, 2023	🗣️ Talk on “Introduction to Deep Reinforcement Learning” at the University of Pretoria and Indaba X Ghana.
Apr, 2023	🌟 Attended ICLR in Kigali, Rwanda.
Aug, 2022	🛠️ Co-Organiser of the ML Efficiency Workshop at the DLI.
Aug, 2022	🛠️ Programme committee member and Practicals Chair of Deep Learning Indaba (DLI) – notebooks 2022, ML Prac, RL Prac.
Jun, 2022	🗣️ Taught an “Introduction to Machine Learning” course at Africa to Silicon Valley.
Mar, 2021	🤖 Joined the Multi-Agent RL research team at InstaDeep.
Dec, 2019	🌟 Attended NeurIPS in Vancouver, Canada.
Aug, 2019	🏆 Won Best Poster (1 out of 194) at the Deep Learning Indaba, sponsored by Microsoft.

selected publications

AAMAS

Probing Dec-POMDP Reasoning in Cooperative MARL

Kale-ab Abebe Tessera, Leonard Hinckeldey, Riccardo Zamboni, and 2 more authors

In The 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Oral, 2026

Bib PDF Poster Website

@inproceedings{tessera2026probing,
  title = {Probing Dec-{POMDP} Reasoning in Cooperative {MARL}},
  author = {Tessera, Kale-ab Abebe and Hinckeldey, Leonard and Zamboni, Riccardo and Abel, David and Storkey, Amos},
  booktitle = {The 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), <strong>Oral</strong>},
  year = {2026},
  url = {https://openreview.net/forum?id=gSK8tR7du3},
}

AAMAS

Fairness over Equality: Correcting Social Incentives in Asymmetric Sequential Social Dilemmas

Alper Demir, Hüseyin Aydın, Kale-ab Abebe Tessera, and 2 more authors

In The 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Oral, 2026

Bib PDF

@inproceedings{demir2025fairness,
  title = {Fairness over Equality: Correcting Social Incentives in Asymmetric Sequential Social Dilemmas},
  author = {Demir, Alper and Ayd{\i}n, H{\"u}seyin and Tessera, Kale-ab Abebe and Abel, David and Albrecht, Stefano V.},
  booktitle = {The 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), <strong>Oral</strong>},
  year = {2026},
  url = {https://openreview.net/forum?id=byUCRyFvSZ},
}

NeurIPS
HyperMARL: Adaptive Hypernetworks for Multi-Agent RL

Kale-ab Abebe Tessera, Arrasy Rahman, Amos Storkey, and 1 more author

In The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

Abs Bib PDF Code Poster Slides

Adaptive cooperation in multi-agent reinforcement learning (MARL) requires policies to express homogeneous, specialised, or mixed behaviours, yet achieving this adaptivity remains a critical challenge. While parameter sharing (PS) is standard for efficient learning, it notoriously suppresses the behavioural diversity required for specialisation. This failure is largely due to cross-agent gradient interference, a problem we find is surprisingly exacerbated by the common practice of coupling agent IDs with observations. Existing remedies typically add complexity through altered objectives, manual preset diversity levels, or sequential updates – raising a fundamental question: can shared policies adapt without these intricacies? We propose a solution built on a key insight: an agent-conditioned hypernetwork can generate agent-specific parameters and decouple observation- and agent-conditioned gradients, directly countering the interference from coupling agent IDs with observations. Our resulting method, HyperMARL, avoids the complexities of prior work and empirically reduces policy gradient variance. Across diverse MARL benchmarks (22 scenarios, up to 30 agents), HyperMARL achieves performance competitive with six key baselines while preserving behavioural diversity comparable to non-parameter sharing methods, establishing it as a versatile and principled approach for adaptive MARL. The code is publicly available at https://github.com/KaleabTessera/HyperMARL.
@inproceedings{tessera2025hypermarl, title = {HyperMARL: Adaptive Hypernetworks for Multi-Agent {RL}}, author = {Tessera, {Kale-ab} Abebe and Rahman, Arrasy and Storkey, Amos and Albrecht, Stefano V}, booktitle = {The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS)}, year = {2025}, url = {https://openreview.net/forum?id=56CgYnf9Dr}, }
NeurIPS Workshop

Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A

Andries Smit, Paul Duckworth, Nathan Grinsztajn, and 3 more authors

In Deep Generative Models for Health Workshop NeurIPS 2023, Nov 2023

Abs PDF

Recent advancements in large language models (LLMs) underscore their potential for responding to medical inquiries. However, ensuring that generative agents provide accurate and reliable answers remains an ongoing challenge. In this context, multi-agent debate (MAD) has emerged as a prominent strategy for enhancing the truthfulness of LLMs. In this work, we provide a comprehensive benchmark of MAD strategies for medical Q&A, along with open-source implementations. This sheds light on the effective utilization of various strategies including the trade-offs between cost, time, and accuracy. We build upon these insights to provide a novel debate-prompting strategy based on agent agreement that outperforms previously published strategies on medical Q&A tasks.
arXiv

Mava: a research framework for distributed multi-agent reinforcement learning

Arnu Pretorius ^*, Kale-ab Abebe Tessera ^*, Andries P Smit ^*, and 8 more authors

arXiv preprint arXiv:2107.01460v1, Jul 2021

Abs PDF Blog Code

Breakthrough advances in reinforcement learning (RL) research have led to a surge in the development and application of RL. To support the field and its rapid growth, several frameworks have emerged that aim to help the community more easily build effective and scalable agents. However, very few of these frameworks exclusively support multi-agent RL (MARL), an increasingly active field in itself, concerned with decentralised decision-making problems. In this work, we attempt to fill this gap by presenting Mava: a research framework specifically designed for building scalable MARL systems. Mava provides useful components, abstractions, utilities and tools for MARL and allows for simple scaling for multi-process system training and execution, while providing a high level of flexibility and composability. Mava is built on top of DeepMind’s Acme \citephoffman2020acme, and therefore integrates with, and greatly benefits from, a wide range of already existing single-agent RL components made available in Acme. Several MARL baseline systems have already been implemented in Mava. These implementations serve as examples showcasing Mava’s reusable features, such as interchangeable system architectures, communication and mixing modules. Furthermore, these implementations allow existing MARL algorithms to be easily reproduced and extended. We provide experimental results for these implementations on a wide range of multi-agent environments and highlight the benefits of distributed system training.