MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

Sizhe Wei^*1

,

Xulin Chen^*2

,

,

,

,

¹

²

³

^* Equal Contribution ^† Corresponding Author

arXiv Code (Coming Soon)

Abstract

Reinforcement learning has recently enabled impressive locomotion capabilities on legged robots; however, most policy architectures remain morphology- and symmetry-agnostic, leading to inefficient training and limited generalization. This work introduces MS-PPO, a morphological-symmetry-equivariant policy learning framework that encodes robot kinematic structure and morphological symmetries directly into the policy network. We construct a morphology-informed graph neural architecture that is provably equivariant with respect to the robot's morphological symmetry group actions, ensuring consistent policy responses under symmetric states while maintaining invariance in value estimation. This design eliminates the need for tedious reward shaping or costly data augmentation, which are typically required to enforce symmetry. We evaluate MS-PPO in simulation on Unitree Go2 and Xiaomi CyberDog2 robots across diverse locomotion tasks, including trotting, pronking, slope walking, and bipedal turning, and further deploy the learned policies on hardware. Extensive experiments show that MS-PPO achieves superior training stability, command generalization ability, and sample efficiency in challenging locomotion tasks, compared to state-of-the-art baselines. These findings demonstrate that embedding both kinematic structure and morphological symmetry into policy learning provides a powerful inductive bias for legged robot locomotion control.

Method

Overall framework of MS-PPO: Common and privileged observations are first converted into a graph data structure where the graph is constructed from the robot kinematic connectivity. These are then fed into the MS-GNN-Inv critic network to predict the state value, while common observations are input to the MS-GNN-Equ actor network to output the mean target joint position at each joint. By design, MS-PPO imposes hard symmetry constraints on the policy learning.

Walk-to-One-Side with Go2 Robot

In this task, we evaluate the task-level symmetry performance of the learned policies on Go2 robot. The locomotion policy is trained using commands restricted to one side, i.e., $c_x \in [-1,0, 1.0] (m/s)$, $c_y \in [0.0, 0.6] (m/s)$, $c_{\text{yaw}}\in[0.0, 1.0](rad/s)$, and evaluated under commands in both directions to validate symmetry generalization.

Simulation Results in Trot Gait

We visualize the evaluation of the Go2 robot walking to one side with trotting gait in simulator. The first row shows the in-distribution command, and the second row shows the out-of-distribution command.

Simulation Results in Pronk Gait

We show the evaluation of the Go2 robot walking to one side with pronking gait in simulator. The first row shows the in-distribution command, and the second row shows the out-of-distribution command.

Bipedal Locomotion with Cyberdog2 Robot

Walk-on-Slope: This task requires the Cyberdog2 robot to stand up and walk on an inclined flat surface using two rear feet.

Stand-and-Turn: This task requires the Cyberdog2 robot to stand up on two feet and follow the input heading commands.

Simulation Results for bipedal locomotion

We show the evaluation of the Cyberdog2 robot walk-on-slope (first row) and standing and turning (second row) in simulator.

Real-World Experiments with Go2 Robot

We deployed the trot gaits of the four policies on a physical Unitree Go2 robot to test sim-to-real adaptation. Each policy was tested with $c_\omega = \pm 1$ $rad/s$ and $c_x=c_y=0$ $m/s$ via a remote controller for 15 seconds. Thus, the desired behavior is in-place rotation with no drift. MS-PPO achieved nearly in-place turning for both commands, PPO-EMLP exhibited moderate drift, consistent with its behavior in the training set, where the robot is turning left. PPO-MLP and MI-PPO, however, showed significant drift. Furthermore, MI-PPO failed to maintain a stable gait or accurately track the command velocities. These results are consistent with our observation in the simulation test and demonstrate MS-PPO's superior capability for out-of-distribution generalization on physical hardware.

Real-World Experiments with Go2 Robot

We show the evaluation of the Go2 robot walk-to-one-side with trotting gait. The command is in-distribution (first row) and out-of-distribution (second row).

MS-PPO builds upon our previous work on morphological structure and symmetry in robotic learning:

MS-HGNN: Morphological-Symmetry-Equivariant Heterogeneous Graph Neural Network

L4DC 2025
A morphological-symmetry-equivariant heterogeneous graph neural network for robotic dynamics learning that integrates robotic kinematic structures and morphological symmetries into a single graph network. MS-PPO extends this work to policy learning for legged robot locomotion.

arXiv Code Project

MI-HGNN: Morphology-Informed Heterogeneous Graph Neural Network

ICRA 2025
A morphology-informed heterogeneous graph neural network that leverages robot kinematic structures for dynamics learning. This work laid the foundation for incorporating morphological priors into graph-based learning architectures.

arXiv Code Project

Bibtex

@article{wei2025msppo,
  title={MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion},
  author={Wei, Sizhe and Chen, Xulin and Xie, Fengze and Katz, Garrett Ethan and Gan, Zhenyu and Gan, Lu},
  journal={arXiv preprint arXiv:2512.00727},
  year={2025}
}

Citation copied!

MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

Abstract

Method

Walk-to-One-Side with Go2 Robot

Bipedal Locomotion with Cyberdog2 Robot

Real-World Experiments with Go2 Robot

Related Research

MS-HGNN: Morphological-Symmetry-Equivariant Heterogeneous Graph Neural Network

MI-HGNN: Morphology-Informed Heterogeneous Graph Neural Network

Bibtex