HIVEX: A High-Impact Environment Suite for Multi-Agent Research

Currently, no open-source benchmark for multi-agent reinforcement learning (MARL) closely mimics real-world scenarios focused on critical ecological challenges, offering sub-tasks, fine-grained terrain elevation or various layout patterns, supporting open-ended learning through procedurally generated environments and providing visual richness. Most common benchmarks with direct realworld applications are in the following domains: 1. intelligent machines and devices, 2. chemical engineering, biotechnology, and medical treatment, 3. human and society, and 4. social dilemmas Ning & Xie (2024).

The main HIVEX environment features are either procedurally generated or sampled from a random distribution. Therefore, training and evaluation are differentiated by seed values, ensuring testing scenarios are not seen during training. We aim to assess and compare MARL algorithms, focusing on test-time evaluation with zero-shot test scenarios. If applicable, a scenario consists of an environment and a task-pattern or terrain elevation combination. Each environment has a main end-to-end task and isolated subtasks that are independent or part of the main task. Environments have between two and nine tasks, various layout patterns, or terrain elevation levels. The environments described are ordered by increasing complexity in observation size and type, action count and type, and reward granularity, including individual and collective rewards. We introduce combinations of vector and visual observations and discrete and continuous actions.

Drone based Reforestation Environment

Oral at ICLR CCAI 2025

Wind Farm Control

Wind Farm Control – Main environment features: Main wind direction, wind noise field sample, agent controlled wind turbine. The default layout of the WFC environment consists of eight wind turbines. Each turbine receives six vector inputs: its position (x, y), its orientation (x, y), and the local wind direction (x, y). The agent controlling each turbine has three discrete actions: do nothing, turn left, or turn right. The primary reward is based on the amount of wind energy generated when the turbine is optimally aligned with the wind direction.

The default layout of the WFC environment consists of eight wind turbines. Each turbine receives six vector inputs: its position (x, y), its orientation (x, y), and the local wind direction (x, y). The agent controlling each turbine has three discrete actions: do nothing, turn left, or turn right. The primary reward is based on the amount of wind energy generated when the turbine is optimally aligned with the wind direction.

Wildfire Resource Management

Wildfire Resource Management – Main environment features: Wind field sample, overcast field sample, temperature field sample, humidity field sample, growing wildfire.

The WRM environment consists of nine agents, each managing one of nine watchtowers. Each agent observes three environmental factors: temperature, humidity, and cloud cover, as well as whether a fire has been detected within 600 meters and the current resource level of its watchtower. Each watchtower starts with 1.0 resources, which can be allocated in 0.1 increments to either the agent’s own tower or neighboring towers. Agents receive maximum rewards when their watchtower is well-resourced and a fire is approaching. For each step where the fire approaches and the watchtower is adequately prepared, the agent receives a high reward.

Ocean Plastic Collection

Ocean Plastic Collection – The main environment features an Agent-controlled ocean plastic collection vessel, trash field sample, nearest neighbours, and trash population map.

The default OPC environment includes three agents, each controlling a plastic collection vessel. Agents receive a 25×25 visual grid, where each cell represents 2 meters, along with vector observations such as their position (x, y), forward direction (x, y), and the position of the nearest agent (x, y). Agents can move forward, turn left, or turn right. Rewards are granted for each plastic pebble successfully collected from the ocean.

 

Drone-Based Reforestation

Drone-Based Reforestation – Main environment features: Terrain sample, forest sample, non-visible to agent optimal reforestation area, non-visible to agent height map.

The default DBR environment features three agents, each controlling a drone. Each agent’s observations include a vector with data such as the drone’s distance to the ground, position (x, y, z), spawn height, whether it’s carrying a seed, battery levels, and terrain, forest, and height maps. Additionally, agents receive a 32×32 grayscale visual observation. Agents can perform actions such as moving forward, backward, up, down, rotating left or right, saving optimal positions, and dropping a seed if carrying one. Rewards are given for successful seed drops, with bonuses for drops in highly fertile areas.

Aerial Wildfire Suppression

Aerial Wildfire Suppression Environment: (1) Water Collection Area, (2) Agent-controlled Wildfire Suppression Aeroplanes, (3) Village. Environment Features: Wind field sample, overcast field sample, temperature field sample, humidity field sample.

The default AWS environment consists of three agents, each controlling an airplane. Each agent receives both vector and visual observations. The vector observations include position (x, y), forward direction (x, y), the position of the nearest tree (x, y), and the tree’s state: either [burning] or [not burning]. The visual observation is a 42x42x3 rgb grid. Agents can steer left, steer right, or release water. Rewards are given for extinguishing burning trees, with smaller rewards for preparing non-burning but alive trees. A small reward is also granted for picking up water.