Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving

Zilin Huang, Zihao Sheng, Sikai Chen^*

University of Wisconsin-Madison
^*Corresponding Author

The effectiveness of traditional RLHF methods is highly dependent on the quality of human feedback. Expert-level human feedback is usually required. when human mentors make mistakes, it may lead to training oscillation or even failure.

PE-RLHF method maintains the strengths of RLHF, while utilizing physics knowledge to guarantee a trustworthy safety performance lower bound, even when human feedback quality deteriorates due to various factors (e.g., distractions or fatigue).

Summary

Physics-enhanced Reinforcement Learning with Human Feedback (PE-RLHF) is designed for safe and trustworthy autonomous driving.

PE-RLHF is a novel framework that synergistically integrates human feedback with physics knowledge into the RL training loop.
PE-RLHF introduces a new human-AI collaborative paradigm that ensures a trustworthy safety performance lower bound.
PE-RLHF employs a reward-free approach with a proxy value function to represent human preferences and guide training process.

In extensive experiments across various driving scenarios:

PE-RLHF significantly reduces safety violations, demonstrating superior performance compared to traditional methods.
PE-RLHF significantly improves learning efficiency, enabling faster policy development compared to traditional RL methods.
PE-RLHF adapts to different levels of human feedback quality, ensuring robust performance.

Demo

Training phase (professional mentor)

Testing phase (professional mentor)

Training phase (amateur mentor)

Testing phase (amateur mentor)

Despite the decline in amateur mentor’ feedback quality, the performance of PE-RLHF is still superior compared to other methods, attributable to the injection of physical knowledge.

Overview

(a) IL faces two significant issues in practical applications: distribution shift and limitations in asymptotic performance. Training safety and sampling efficiency are the two major bottlenecks constraining RL methods. Additionally, designing an appropriate reward function to capture all desired driving behaviors can be challenging. (b) Reinforcement Learning from Human Feedback (RLHF) method has proven effective in enhancing both training safety and sampling efficiency in RL. Most studies require expert-level human feedback. Also, few studies have focused on the trustworthiness of RLHF-enabled methods. (c) In this work, we propose a novel framework named Physics-enhanced Reinforcement Learning to bridge this gap.

Motivation

(a) When learning a foreign language, a student may be guided by two mentors: a native speaker and a grammar book. When the native speaker’s explanations are unclear, learners can refer to grammar books as a reliable reference. As a result, the student’s language skills are improved by learning from both mentors. Similarly, in transportation science, there is well-established physics knowledge (e.g., traffic flow models). (b) Inspired by the human learning process, we propose the Physics-enhanced Human-AI (PE-HAI) Collaborative Paradigm. We use the actions of a physics-based model when the quality of human feedback deteriorates due to fatigue or distraction. We design an action selection mechanism to determine whether the human action or the physics-based model should be applied to the environment.

Action Selection Mechanism

In a roadblock avoidance scenario, traditional RL methods would collide with the roadblock before learning to avoid it. In PE-HAI, humans perceive danger and take over, making a left lane change. However, humans may subsequently make erroneous maneuvers such as deviating from the road. In this case, PE-HAI switches to the physics-based action, thus ensuring training safety. Finally, the agent learns a safe and efficient obstacle avoidance strategy from this hybrid policy.

Framework

The PE-RLHF framework consists of five parts: (a) Observation space and action space, (b) Reward-free actor-critic architecture, (c) Learning from hybrid intervention action, (d) Learning from exploration with entropy regularization, and (e) Reducing the human mentor’s cognitive load.

Experiment

PE-RLHF demonstrated trustworthy safety improvement, achieving an episodic safety cost of 0.47, which significantly outperforms the standalone IDM-MOBIL model and other advance RLHF methods. Moreover, PE-RLHF exhibited high sampling efficiency, reducing the required training data by 74% and training time from over 30 hours to just 1 hour compared to traditional RL methods.

Introduction Video

BibTeX

@article{huang2024trustworthy,
        title={Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving},
        author={Huang, Zilin and Sheng, Zihao and Chen, Sikai},
        journal={arXiv preprint arXiv:2409.00858},
        year={2024}
      }