publications | Kale-ab Tessera

For latest publications, please visit my Google Scholar page.

2025

HyperMARL: Adaptive Hypernetworks for Multi-Agent RL

Kale-ab Abebe Tessera, Arrasy Rahman, Amos Storkey, and 1 more author

In Second Coordination and Cooperation in Multi-Agent Reinforcement Learning Workshop at RLC 2025, 2025

Abs PDF Poster

Adaptability is critical in cooperative multi-agent reinforcement learning (MARL), where agents must learn specialised or homogeneous behaviours for diverse tasks. While parameter sharing methods are sample-efficient, they often encounter gradient interference among agents, limiting their behavioural diversity. Conversely, non-parameter sharing approaches enable specialisation, but are computationally demanding and sample-inefficient. To address these issues, we propose HyperMARL, a parameter sharing approach that uses hypernetworks to dynamically generate agent-specific actor and critic parameters, without altering the learning objective or requiring preset diversity levels. By decoupling observation- and agent-conditioned gradients, HyperMARL empirically reduces policy gradient variance and facilitates specialisation within a shared policy, suggesting it can mitigate cross-agent interference. Across multiple MARL benchmarks involving up to twenty agents – and requiring homogeneous, heterogeneous, or mixed behaviours – HyperMARL consistently performs competitively with fully shared, non-parameter-sharing, and diversity-promoting baselines, all while preserving a behavioural diversity level comparable to non-parameter sharing. These findings establish hypernetworks as a versatile approach for MARL across diverse environments.
Remembering the Markov Property in Cooperative MARL

Kale-ab Abebe Tessera, Leonard Hinckeldey, Riccardo Zamboni, and 2 more authors

In Finding the Frame Workshop at RLC 2025, 2025

Abs PDF Poster

Cooperative multi-agent reinforcement learning (MARL) is typically formalised as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP), where agents must reason about the environment and other agents’ behaviour. In practice, current model-free MARL algorithms use simple recurrent function approximators to address the challenge of reasoning about others using partial information. In this position paper, we argue that the empirical success of these methods is not due to effective Markov signal recovery, but rather to learning simple conventions that bypass environment observations and memory. Through a targeted case study, we show that co-adapting agents can learn brittle conventions, which then fail when partnered with non-adaptive agents. Crucially, the same models can learn grounded policies when the task design necessitates it, revealing that the issue is not a fundamental limitation of the learning models but a failure of the benchmark design. Our analysis also suggests that modern MARL environments may not adequately test the core assumptions of Dec-POMDPs. We therefore advocate for new cooperative environments built upon two core principles: (1) behaviours grounded in observations and (2) memory-based reasoning about other agents, ensuring success requires genuine skill rather than fragile, co-adapted agreements.

2024

Efficiently Quantifying Individual Agent Importance in Cooperative MARL

Omayma Mahjoub, Ruan Kock, Siddarth Singh, and 4 more authors

eXplainable AI approaches for deep reinforcement learning (XAI4DRL) Workshop @ AAAI (Oral), Feb 2024

Abs PDF

Measuring the contribution of individual agents is challenging in cooperative multi-agent reinforcement learning (MARL). In cooperative MARL, team performance is typically inferred from a single shared global reward. Arguably, among the best current approaches to effectively measure individual agent contributions is to use Shapley values. However, calculating these values is expensive as the computational complexity grows exponentially with respect to the number of agents. In this paper, we adapt difference rewards into an efficient method for quantifying the contribution of individual agents, referred to as Agent Importance, offering a linear computational complexity relative to the number of agents. We show empirically that the computed values are strongly correlated with the true Shapley values, as well as the true underlying individual agent rewards, used as the ground truth in environments where these are available. We demonstrate how Agent Importance can be used to help study MARL systems by diagnosing algorithmic failures discovered in prior MARL benchmarking work. Our analysis illustrates Agent Importance as a valuable explainability component for future MARL benchmarks.
How much can change in a year? Revisiting Evaluation in Multi-Agent Reinforcement Learning

Siddarth Singh, Omayma Mahjoub, Ruan Kock, and 4 more authors

eXplainable AI approaches for deep reinforcement learning (XAI4DRL) Workshop @ AAAI, Feb 2024

Abs PDF

Establishing sound experimental standards and rigour is important in any growing field of research. Deep Multi-Agent Reinforcement Learning (MARL) is one such nascent field. Although exciting progress has been made, MARL has recently come under scrutiny for replicability issues and a lack of standardised evaluation methodology, specifically in the cooperative setting. Although protocols have been proposed to help alleviate the issue, it remains important to actively monitor the health of the field. In this work, we extend the database of evaluation methodology previously published by (Gorsane et al., 2022) containing meta-data on MARL publications from top-rated conferences and compare the findings extracted from this updated database to the trends identified in their work. Our analysis shows that many of the worrying trends in performance reporting remain. This includes the omission of uncertainty quantification, not reporting all relevant evaluation details and a narrowing of algorithmic development classes. Promisingly, we do observe a trend towards more difficult scenarios in SMAC-v1, which if continued into SMAC-v2 will encourage novel algorithmic development. Our data indicate that replicability needs to be approached more proactively by the MARL community to ensure trust in the field as we move towards exciting new frontiers.
Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning

Aditya Kapoor, Sushant Swamy, Kale-ab Tessera, and 4 more authors

In Coordination and Cooperation for Multi-Agent Reinforcement Learning Methods Workshop at RLC 2024, Jun 2024

Abs PDF

The ability of agents to learn optimal policies is hindered in multi-agent environments where all agents receive a global reward signal sparsely or only at the end of an episode. The delayed nature of these rewards, especially in long-horizon tasks, makes it challenging for agents to evaluate their actions at intermediate time steps. In this paper, we propose Agent-Temporal Reward Redistribution (ATRR), a novel approach to tackle the agent-temporal credit assignment problem by redistributing sparse environment rewards both temporally and at the agent level. ATRR first decomposes the sparse global rewards into rewards for each time step and then calculates agent-specific rewards by determining each agent’s relative contribution to these decomposed temporal rewards. We theoretically prove that there exists a redistribution method equivalent to potential-based reward shaping, ensuring that the optimal policy remains unchanged. Empirically, we demonstrate that ATRR stabilizes and expedites the learning process. We also show that ATRR, when used alongside single-agent reinforcement learning algorithms, performs as well as or better than their multi-agent counterparts.

2023

Reduce, Reuse, Recycle: Selective Reincarnation in Multi-Agent Reinforcement Learning

Juan Claude Formanek, Callum Rhys Tilbury, Jonathan Phillip Shock, and 2 more authors

In Workshop on Reincarnating Reinforcement Learning at ICLR 2023 (Oral), Mar 2023

Abs PDF Website

’Reincarnation’ in reinforcement learning has been proposed as a formalisation of reusing prior computation from past experiments when training an agent in an environment. In this paper, we present a brief foray into the paradigm of reincarnation in the multi-agent (MA) context. We consider the case where only some agents are reincarnated, whereas the others are trained from scratch – selective reincarnation. In the fully-cooperative MA setting with heterogeneous agents, we demonstrate that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation. However, the choice of which agents to reincarnate in a heterogeneous system is vitally important to the outcome of the training – in fact, a poor choice can lead to considerably worse results than the alternatives. We argue that a rich field of work exists here, and we hope that our effort catalyses further energy in bringing the topic of reincarnation to the multi-agent realm.
Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A

Andries Smit, Paul Duckworth, Nathan Grinsztajn, and 3 more authors

In Deep Generative Models for Health Workshop NeurIPS 2023, Nov 2023

Abs PDF

Recent advancements in large language models (LLMs) underscore their potential for responding to medical inquiries. However, ensuring that generative agents provide accurate and reliable answers remains an ongoing challenge. In this context, multi-agent debate (MAD) has emerged as a prominent strategy for enhancing the truthfulness of LLMs. In this work, we provide a comprehensive benchmark of MAD strategies for medical Q&A, along with open-source implementations. This sheds light on the effective utilization of various strategies including the trade-offs between cost, time, and accuracy. We build upon these insights to provide a novel debate-prompting strategy based on agent agreement that outperforms previously published strategies on medical Q&A tasks.
Generalisable Agents for Neural Network Optimisation

Kale-ab Tessera ^*, Callum Tilbury ^*, Sasha Abramowitz ^*, and 5 more authors

In Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@NeurIPS 2023) and OPT 2023: Optimization for Machine Learning, Oct 2023

Abs PDF Poster

Optimising deep neural networks is a challenging task due to complex training dynamics, high computational requirements, and long training times. To address this difficulty, we propose the framework of Generalisable Agents for Neural Network Optimisation (GANNO) – a multi-agent reinforcement learning (MARL) approach that learns to improve neural network optimisation by dynamically and responsively scheduling hyperparameters during training. GANNO utilises an agent per layer that observes localised network dynamics and accordingly takes actions to adjust these dynamics at a layerwise level to collectively improve global performance. In this paper, we use GANNO to control the layerwise learning rate and show that the framework can yield useful and responsive schedules that are competitive with handcrafted heuristics. Furthermore, GANNO is shown to perform robustly across a wide variety of unseen initial conditions, and can successfully generalise to harder problems than it was trained on. Our work presents an overview of the opportunities that this paradigm offers for training neural networks, along with key challenges that remain to be overcome.

2022

Just-in-Time Sparsity: Learning Dynamic Sparsity Schedules

Kale-ab Tessera, Chiratidzo Matowe, Arnu Pretorius, and 2 more authors

In Dynamic Neural Networks, ICML Workshop, Jul 2022

Abs PDF Poster Slides

Sparse neural networks have various computational benefits while often being able to maintain or improve the generalization performance of their dense counterparts. Popular sparsification methods have focused on what to sparsify, i.e. which redundant components to remove from neural networks, while when to sparsify, has received less attention and is usually handled using heuristics or simple schedules. In this work, we focus on learning sparsity schedules from scratch using reinforcement learning. In simple CNNs and ResNet-18, we show that our learned schedules are diverse across layers and training steps, while achieving competitive performance when compared to naive handcrafted schedules. Our methodology is general-purpose and can be applied to learning effective sparsity schedules across any pruning implementation.

2021

Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization

Kale-ab Tessera, Sara Hooker, and Benjamin Rosman

In Sparsity in Neural Networks Workshop, Jul 2021

Abs PDF Poster Slides

Training sparse networks to converge to the same performance as dense neural architectures has proven to be elusive. Recent work suggests that initialization is the key. However, while this direction of research has had some success, focusing on initialization alone appears to be inadequate. In this paper, we take a broader view of training sparse networks and consider the role of regularization, optimization, and architecture choices on sparse models. We propose a simple experimental framework, Same Capacity Sparse vs Dense Comparison (SC-SDC), that allows for a fair comparison of sparse and dense networks. Furthermore, we propose a new measure of gradient flow, Effective Gradient Flow (EGF), that better correlates to performance in sparse networks. Using top-line metrics, SC-SDC and EGF, we show that default choices of optimizers, activation functions and regularizers used for dense networks can disadvantage sparse networks. Based upon these findings, we show that gradient flow in sparse networks can be improved by reconsidering aspects of the architecture design and the training regime. Our work suggests that initialization is only one piece of the puzzle and taking a wider view of tailoring optimization to sparse networks yields promising results.
Mava: a research framework for distributed multi-agent reinforcement learning

Arnu Pretorius ^*, Kale-ab Tessera ^*, Andries P Smit ^*, and 8 more authors

arXiv preprint arXiv:2107.01460v1, Jul 2021

Abs PDF Blog

Breakthrough advances in reinforcement learning (RL) research have led to a surge in the development and application of RL. To support the field and its rapid growth, several frameworks have emerged that aim to help the community more easily build effective and scalable agents. However, very few of these frameworks exclusively support multi-agent RL (MARL), an increasingly active field in itself, concerned with decentralised decision-making problems. In this work, we attempt to fill this gap by presenting Mava: a research framework specifically designed for building scalable MARL systems. Mava provides useful components, abstractions, utilities and tools for MARL and allows for simple scaling for multi-process system training and execution, while providing a high level of flexibility and composability. Mava is built on top of DeepMind’s Acme \citephoffman2020acme, and therefore integrates with, and greatly benefits from, a wide range of already existing single-agent RL components made available in Acme. Several MARL baseline systems have already been implemented in Mava. These implementations serve as examples showcasing Mava’s reusable features, such as interchangeable system architectures, communication and mixing modules. Furthermore, these implementations allow existing MARL algorithms to be easily reproduced and extended. We provide experimental results for these implementations on a wide range of multi-agent environments and highlight the benefits of distributed system training.
On pseudo-absence generation and machine learning for locust breeding ground prediction in Africa

Ibrahim Salihu Yusuf, Kale-ab Tessera, Thomas Tumiel, and 2 more authors

In AI + HADR and ML4D NeurIPS Workshops, Nov 2021

Abs PDF Blog Code

Desert locust outbreaks threaten the food security of a large part of Africa and have affected the livelihoods of millions of people over the years. Machine learning (ML) has been demonstrated as an effective approach to locust distribution modelling which could assist in early warning. ML requires a significant amount of labelled data to train. Most publicly available labelled data on locusts are presence-only data, where only the sightings of locusts being present at a location are recorded. Therefore, prior work using ML have resorted to pseudo-absence generation methods as a way to circumvent this issue. The most commonly used approach is to randomly sample points in a region of interest while ensuring that these sampled pseudo-absence points are at least a specific distance away from true presence points. In this paper, we compare this random sampling approach to more advanced pseudo-absence generation methods, such as environmental profiling and optimal background extent limitation, specifically for predicting desert locust breeding grounds in Africa. Interestingly, we find that for the algorithms we tested, namely logistic regression, gradient boosting, random forests and maximum entropy, all popular in prior work, the logistic model performed significantly better than the more sophisticated ensemble methods, both in terms of prediction accuracy and F1 score. Although background extent limitation combined with random sampling boosted performance for ensemble methods, for LR this was not the case, and instead, a significant improvement was obtained when using environmental profiling. In light of this, we conclude that a simpler ML approach such as logistic regression combined with more advanced pseudo-absence generation, specifically environmental profiling, can be a sensible and effective approach to predicting locust breeding grounds across Africa.