- International Conference on Learning Representations (ICLR) 2025 at the Tackling Climate Change with Machine Learning (CCAI) workshop
- Autonomous Agents and Multiagent Systems (AAMAS) 2025 at the Workshop on Autonomous Agents for Social Good
Paper: link
Repository: link
Currently, no open-source benchmark for multi-agent reinforcement learning (MARL) closely mimics real-world scenarios focused on critical ecological challenges, offering sub-tasks, fine-grained
terrain elevation or various layout patterns, supporting open-ended learning through procedurally
generated environments and providing visual richness. Most common benchmarks with direct realworld applications are in the following domains: 1. intelligent machines and devices, 2. chemical
engineering, biotechnology, and medical treatment, 3. human and society, and 4. social dilemmas
Ning & Xie (2024).
The main HIVEX environment features are either procedurally generated or sampled from a random
distribution. Therefore, training and evaluation are differentiated by seed values, ensuring testing
scenarios are not seen during training. We aim to assess and compare MARL algorithms, focusing
on test-time evaluation with zero-shot test scenarios. If applicable, a scenario consists of an environment and a task-pattern or terrain elevation combination. Each environment has a main end-to-end
task and isolated subtasks that are independent or part of the main task. Environments have between
two and nine tasks, various layout patterns, or terrain elevation levels. The environments described
are ordered by increasing complexity in observation size and type, action count and type, and reward
granularity, including individual and collective rewards. We introduce combinations of vector and
visual observations and discrete and continuous actions.
Drone based Reforestation Environment
Oral at ICLR CCAI 2025
Wind Farm Control
Wind Farm Control – Main environment features: Main wind direction, wind noise field sample, agent controlled wind turbine. The default layout of the WFC environment consists of eight wind turbines. Each turbine receives six vector inputs: its position (x, y), its orientation (x, y), and the local wind direction (x, y). The agent controlling each turbine has three discrete actions: do nothing, turn left, or turn right. The primary reward is based on the amount of wind energy generated when the turbine is optimally aligned with the wind direction.
The default layout of the WFC environment consists of eight wind turbines. Each turbine receives six vector inputs: its position (x, y), its orientation (x, y), and the local wind direction (x, y). The agent controlling each turbine has three discrete actions: do nothing, turn left, or turn right. The primary reward is based on the amount of wind energy generated when the turbine is optimally aligned with the wind direction.
Wildfire Resource Management
Wildfire Resource Management – Main environment features: Wind field sample, overcast field sample, temperature field sample, humidity field sample, growing wildfire.
The WRM environment consists of nine agents, each managing one of nine watchtowers. Each agent observes three environmental factors: temperature, humidity, and cloud cover, as well as whether a fire has been detected within 600 meters and the current resource level of its watchtower. Each watchtower starts with 1.0 resources, which can be allocated in 0.1 increments to either the agent’s own tower or neighboring towers. Agents receive maximum rewards when their watchtower is well-resourced and a fire is approaching. For each step where the fire approaches and the watchtower is adequately prepared, the agent receives a high reward.
Ocean Plastic Collection
Ocean Plastic Collection – The main environment features an Agent-controlled ocean plastic collection vessel, trash field sample, nearest neighbours, and trash population map.
The default OPC environment includes three agents, each controlling a plastic collection vessel. Agents receive a 25×25 visual grid, where each cell represents 2 meters, along with vector observations such as their position (x, y), forward direction (x, y), and the position of the nearest agent (x, y). Agents can move forward, turn left, or turn right. Rewards are granted for each plastic pebble successfully collected from the ocean.
Drone-Based Reforestation
Drone-Based Reforestation – Main environment features: Terrain sample, forest sample, non-visible to agent optimal reforestation area, non-visible to agent height map.
The default DBR environment features three agents, each controlling a drone. Each agent’s observations include a vector with data such as the drone’s distance to the ground, position (x, y, z), spawn height, whether it’s carrying a seed, battery levels, and terrain, forest, and height maps. Additionally, agents receive a 32×32 grayscale visual observation. Agents can perform actions such as moving forward, backward, up, down, rotating left or right, saving optimal positions, and dropping a seed if carrying one. Rewards are given for successful seed drops, with bonuses for drops in highly fertile areas.
Aerial Wildfire Suppression
Aerial Wildfire Suppression Environment: (1) Water Collection Area, (2) Agent-controlled Wildfire Suppression Aeroplanes, (3) Village. Environment Features: Wind field sample, overcast field sample, temperature field sample, humidity field sample.
The default AWS environment consists of three agents, each controlling an airplane. Each agent receives both vector and visual observations. The vector observations include position (x, y), forward direction (x, y), the position of the nearest tree (x, y), and the tree’s state: either [burning] or [not burning]. The visual observation is a 42x42x3 rgb grid. Agents can steer left, steer right, or release water. Rewards are given for extinguishing burning trees, with smaller rewards for preparing non-burning but alive trees. A small reward is also granted for picking up water.