Physics-regulated Deep Reinforcement Learning: Invariant Embeddings

1 Technical University of Munich (TUM)
2 Wayne State University
3 University of Illinois Urbana-Champaign (UIUC)
ICLR (spotlight) 2024

*Equal contribution

Email: cao.hongpeng@tum.de; maoyanbing.eth@gmail.com

Abstract

This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) (automatically construct) safety-embedded reward, and iii) physics-model-guided neural network (NN) editing, including link editing and activation editing. Theoretically, the Phy-DRL exhibits 1) mathematically-provable safety guarantee, and 2) strict compliance of critic and actor networks with physics knowledge about the action-value function and action policy. Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guaranteed safety compared to purely data-driven DRL and solely model-based design, while offering remarkably fewer learning parameters and fast training towards safety guarantee.

Overview

MY ALT TEXT

We propose the Phy-DRL: a physics-regulated deep reinforcement learning framework with enhanced safety assurance.
Phy-DRL has three novel (invariant-embedding) architectural designs:
(1) Residual Action Policy, (2) Safety-Embedded Reward, (3) Physics-Knowledge-Enhanced Critic and Actor Networks.

Experimental Results

Experiment on the cart-pole system

MY ALT TEXT

The plot shows that Phy-DRL (left) can render the safety envelope invariant compared to the model-based policy (middle) and the standard DRL (right) in a cart-pole problem. The Rectangular area stands for safety set and ellipse for safety envelope.

Experiment on the quadruped locomotion

MY ALT TEXT

This figure shows the phase plots of different policies running in different environments, given different velocity commands. As shown in the figure, Phy-DRL successfully constraints the robot’s states to a safety set. Given more reasonable velocity commands in environments b) and d), Phy-DRL can also successfully constrain system states to the safety envelope. The Linear and PD policies can only constrain system states to a safety envelope in environment d). The DRL policy violates the safety requirements in all environments, which implies that purely data-driven DRL needs more training steps to search for a safe and robust policy.

Demonstration on a quadruped robot

BibTeX

@inproceedings{Phydrl1,
               title={Physics-Regulated Deep Reinforcement Learning: Invariant Embeddings},
               author={Cao, Hongpeng and Mao, Yanbing and Sha, Lui and  Caccamo, Marco},
               booktitle= {The Twelfth International Conference on Learning Representations},
               year={2024},
               url={https://openreview.net/forum?id=5Dwqu5urzs},
               note={Spotlight}
               }