Beyond Topology: A Morphological Symmetry Graph Representation for Locomotion Policy Learning
Abstract
Reinforcement learning has enabled impressive locomotion skills on articulated robots, but common policy representations remain only weakly aligned with robot physics. Generic networks ignore kinematic organization, while graph-based policies encode connectivity without specifying how physical quantities transform across symmetric body parts. We introduce a morphological symmetry graph representation for locomotion policy learning and instantiate it in MS-PPO. Starting from the robot's topological graph, our representation augments each observation and action space with the permutation and sign transformations induced by morphological symmetry. This yields a symmetry-equivariant graph actor and a symmetry-invariant graph critic, enforcing the desired policy and value constraints by construction rather than through reward shaping or data augmentation. We evaluate MS-PPO on Unitree Go2 quadruped and Unitree G1 humanoid locomotion tasks, including command tracking, asymmetric joint failures, training-efficiency comparisons, and zero-shot sim-to-real deployment. Experiments show improved symmetry generalization, robustness, sample efficiency, and model efficiency over topology- and symmetry-aware baselines.
Quadrupedal Figure-8 Walking (Unitree Go2)
The walk-figure-8 task requires tracking a symmetric trajectory with alternating clockwise and counterclockwise turns, directly evaluating left-right angular velocity tracking. MS-PPO achieves the best angular tracking error and highest reward, while using only 34.6% of PPO's parameters. Its symmetry-equivariant actor couples mirrored observations and actions by construction, so left- and right-turn behaviors are shared rather than learned independently. Baselines are PPO (generic MLP), PPO-EMLP (flat $\mathbb{C}_2$-equivariant MLP), PPO-Aug (symmetry via data augmentation), and MI-PPO (topology-aware GNN without symmetry).
Symmetry Generalization — Walk to One Side (Unitree Go2)
Policies are trained using one-sided lateral and yaw-rate commands (left-turn only: $c_y \in [0, 0.6]$ m/s, $c_\omega \in [0, 1]$ rad/s), then evaluated on both the trained direction and the mirrored right-turn direction. The right-turn command is out-of-distribution (OOD). A policy with perfect $\mathbb{C}_2$ equivariance generalizes symmetrically without retraining. On the trot gait, MS-PPO achieves 51.8%, 46.6%, and 61.4% lower OOD tracking error (RMSE-O) compared to PPO, PPO-EMLP, and MI-PPO. On the pronk gait, which demands tighter whole-body coordination, MS-PPO is the only method maintaining low error in both directions. Each row below shows one method; columns show in-distribution (left turn) and OOD (right turn).
Joint Failure Tolerance (Unitree Go2)
The joint zero-torque experiment evaluates robustness to localized actuator failures. The robot is commanded to walk forward at 1 m/s while one specified joint receives zero torque, forcing the policy to compensate with the remaining joints. MS-PPO achieves the best average reward under hip- and calf-failures and remains competitive under thigh-failures. Unlike a flat $\mathbb{C}_2$-equivariant MLP that globally couples both sides of the robot, MS-PPO localizes the disabled joint to its graph node and propagates compensation through the kinematic topology. The hardware and simulation videos below show the rear-right calf failure case.
Dual-Robot Collaboration under Joint Failure (Unitree Go2)
Two Go2 robots are connected in a lead-follower configuration; the lead robot's rear-right calf joint is disabled. This setup combines actuator failure with interaction disturbance, since the rear robot can drag or perturb the faulted lead. PPO and PPO-EMLP lose balance by around 6 seconds. PPO-Aug survives longer but develops large rotational drift that destabilizes both robots. MI-PPO drifts laterally despite its graph backbone. MS-PPO maintains forward heading, speed, and stable pulling behavior throughout, demonstrating that jointly encoding kinematic topology and morphological symmetry produces the most robust behavior under combined failure and interaction disturbances.
Per-method recordings:
Humanoid Locomotion (Unitree G1)
MS-PPO is evaluated on the Unitree G1 humanoid, which has a significantly more complex morphology than Go2, with higher degrees of freedom and a different kinematic topology. Policies are deployed zero-shot on the physical robot without fine-tuning. Among the methods that deploy, MS-PPO achieves the lowest forward velocity tracking MSE ($0.5671 \pm 0.1418$ m/s).
Related Research
MS-PPO builds upon our previous work on morphological structure and symmetry in robotic learning:
MS-HGNN: Morphological-Symmetry-Equivariant Heterogeneous Graph Neural Network
L4DC 2025
A morphological-symmetry-equivariant heterogeneous graph neural network for robotic dynamics learning that integrates robotic kinematic structures and morphological symmetries into a single graph network. MS-PPO extends this work to policy learning for legged robot locomotion.
MI-HGNN: Morphology-Informed Heterogeneous Graph Neural Network
ICRA 2025
A morphology-informed heterogeneous graph neural network that leverages robot kinematic structures for dynamics learning. This work laid the foundation for incorporating morphological priors into graph-based learning architectures.
Bibtex
@article{wei2025msppo,
title={Beyond Topology: A Morphological Symmetry Graph Representation for Locomotion Policy Learning},
author={Wei, Sizhe and Chen, Xulin and Xie, Fengze and Katz, Garrett Ethan and Gan, Zhenyu and Gan, Lu},
journal={arXiv preprint arXiv:2512.00727},
year={2025}
}