HERCULES: An Open-Source Simulation Framework for Heterogeneous Multi-Robot SLAM, Collaborative Perception, and Exploration

1Georgia Institute of Technology, 2Georgia Tech Research Institute
System overview of HERCULES.

An open-source Unreal Engine 5 simulator for heterogeneous UAV–UGV teams in photorealistic, dynamic, large-scale environments.

Abstract

We present HERCULES, a simulation platform for heterogeneous multi-robot autonomy that enables unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) to collaboratively explore and understand large-scale, complex environments. Built as an advanced fork of AirSim on Unreal Engine 5 (UE5), HERCULES provides a unified autonomy pipeline supporting concurrent UAV–UGV operation, high-fidelity sensing, and large-scale environmental interaction, including offline trajectory generation, kinodynamically feasible motion for both platforms, and complementary coverage and leader–follower trajectory patterns for dataset collection.

A novel autonomous waypoint-tracking UGV controller mirrors UAV interfaces, enabling unified high-level research in exploration and multi-robot coordination. To demonstrate the simulator's utility as a benchmark sandbox, we evaluate state-of-the-art baselines in collaborative SLAM (ROMAN) and collaborative perception (multi-view 3D object detection baselines from DAIR-V2X). Optimized ROS 2 wrappers and lightweight APIs allow seamless development of SLAM, perception, and learning-based algorithms directly on top of this infrastructure.

Three representative environments—desert, forest, and city—along with the option to import georeferenced real-world environments, enable testing algorithms under sparse landmarks, perceptual aliasing, and dynamic obstacles. In addition to the standard AirSim and Cosys-AirSim sensor suite, HERCULES extends thermal and low-light sensing with a physics-based long-wave infrared (LWIR) camera and a configurable night-vision mode. Advanced UE5 assets such as realistic agents, traffic, wildlife, and dynamic environmental phenomena—including fire, flooding, and crop disease spread—are exposed through plug-and-play interfaces for procedural scenario generation. Our open-source code and datasets provide a versatile testbed for advancing heterogeneous multi-robot SLAM, perception, planning, and exploration.

Environments

HERCULES ships with three photorealistic worlds—desert, forest, and city—each engineered to stress a different class of perception and planning failure: sparse landmarks and long-range visibility (desert), perceptual aliasing from repetitive geometry (forest), and dense occlusions with dynamic agents (city). Georeferenced real-world scenes can be imported via Cesium for Unreal.

Australian Outback

Wildlife Kangaroo wildlife roaming the outback world.
UAV + UGV Heterogeneous robot team traversing the outback.

Forest

Wildlife Deer wildlife in the forest world.
UAV + UGV Heterogeneous robot team traversing the forest.

City

Traffic Vehicle traffic through the urban city world.
UAV + UGV Heterogeneous robot team traversing the city.

Overview

  • Integrated heterogeneous autonomy stack. We re-architect the AirSim/Cosys-AirSim SimMode layer to enable concurrent UAV–UGV operation within a single simulation session, resolving a fundamental physics-engine conflict that previously restricted each session to a single vehicle type. On top of this, a unified waypoint-level command interface, an autonomous UGV controller, an end-to-end planning pipeline, and coordinated multi-robot sensor logging make large-scale outdoor heterogeneous experiments practical.
  • New sensor modalities and dynamic environments. Two new sensors not present in Cosys-AirSim: an LWIR thermal camera based on Planck-law spectral radiance, and a night-vision camera with empirical photometric transfer. Parameterized dynamic-environment modules (wildfire spread, flood inundation, crop disease transmission) and dynamic-agent Blueprints (MetaHuman pedestrians, VehicleAI traffic, AnimalAI wildlife) update the shared world state at runtime.
  • Open benchmarks and reproducible release. Ready-to-run evaluation suites for collaborative SLAM (ROMAN) and multi-view 3D object detection (DAIR-V2X-style baselines), a dataset collection pipeline that exports synchronized multimodal data in standard formats (KITTI-style layouts, ROS 2 bags), and performance-optimized ROS 2 wrappers for direct integration with existing autonomy stacks.

Multimodal Dataset

HERCULES exports time-synchronized multimodal data for heterogeneous robot teams. Each dashboard shows, per robot, the RGB, depth, semantic segmentation, and LiDAR streams captured along a coverage or leader–follower trajectory in one of our environments.

City Block Urban City Block sequence — RGB, depth, semantic segmentation, and LiDAR for every robot in the UAV–UGV team.
Australia — Center Coverage Australian outback, center-coverage trajectory — RGB, depth, semantic, and LiDAR streams across the team.
Australia — Perimeter Coverage Australian outback, perimeter-coverage trajectory — RGB, depth, semantic, and LiDAR streams across the team.
Forest Forest sequence — RGB, depth, semantic segmentation, and LiDAR for every robot in the team.

Collaborative SLAM

We benchmark ROMAN collaborative SLAM on a City Block sequence with two UAVs and two UGVs. Each robot runs LIO-SAM odometry with an open-set ROMAN object map; inter-robot loop closures are registered pairwise via CLIPPER.

Live Mapping

UAV 1 — Live Mapping Drone 1 live trajectory, camera pose, and object map (City Block).
UAV 2 — Live Mapping Drone 2 live trajectory, camera pose, and object map (City Block).
UGV 1 — Live Mapping Husky 1 live trajectory, camera pose, and object map (City Block).
UGV 2 — Live Mapping Husky 2 live trajectory, camera pose, and object map (City Block).

Final Map

UAV 1 — Final Map Drone 1 LIO-SAM final map with ROMAN object map (City Block).
UAV 2 — Final Map Drone 2 LIO-SAM final map with ROMAN object map (City Block).
UGV 1 — Final Map Husky 1 LIO-SAM final map with ROMAN object map (City Block).
UGV 2 — Final Map Husky 2 LIO-SAM final map with ROMAN object map (City Block).

Loop-Closure Alignment

Loop-Closure Alignment ROMAN loop-closure: UGV1 and UGV2 submaps aligned pairwise (CLIPPER).

Sensors & Phenomena

Beyond the inherited Cosys-AirSim sensor suite, HERCULES adds physics-based long-wave infrared (LWIR) and night-vision (NVG) cameras, and three classes of dynamic environmental phenomena that update the shared world state at runtime.

New Sensor Modalities

Night-vision (NVG) rendering, desert scene.

Night-vision (NVG)

LWIR thermal rendering, kangaroos near fire.

LWIR thermal

Dynamic Environmental Phenomena

Wildfire Spread Simulated wildfire propagating through a forest environment.
Atlanta Flood Flood inundation over a Cesium 3D model of Atlanta.
Jungle Flood Flood inundation in a dense jungle environment.
Crop disease transmission across agricultural terrain.
Crop Disease Crop disease transmission across agricultural terrain (still).