CoIRL-AD Icon

CoIRL-AD:

Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

Xiaoji Zheng1*, Ziyuan Yang2*, Yanhao Chen3, Yuhang Peng4, Yuanrong Tang1, Gengyuan Liu1, Bokui Chen1‡ and Jiangtao Gong1‡

1 Tsinghua University 2 Washington University
3 Beijing Jiaotong University 4 The Hong Kong Polytechnic University
* Eqaul Contribution Corresponding Author
Code Model Paper Demos Results Discussions

Overview

COIRL-AD introduces a unified framework that integrates imitation and reinforcement learning through a collaborative-competitive paradigm within latent world models. The approach enhances policy robustness, enables self-improvement beyond expert data, and achieves consistent performance across complex driving scenarios.

Figure 1. Overview of the COIRL-AD framework. CoIRL-AD adopts a dual-policy architecture that integrates imitation learning (IL) and reinforcement learning (RL) through a shared latent world model. In each iteration, the IL actor and RL actor are trained in parallel. The latent world model is learned during the IL phase and then used in the RL phase, where only the RL actor and critic are updated. For exploration, the RL actor samples multiple action sequences, predicts future states via the latent world model, and evaluates them with rule-based reward functions. The critic assigns advantages to each sequence based on the imagined trajectories and rewards. To promote interaction, a competitive learning mechanism exchanges knowledge between the IL and RL actors.

Method Demonstrations

Good Case: Going Straight

Good Case: Turning Left

Good Case: Turning Right

Bad Case

Results on Nuscenes

Results on NuScenes Eval Set

Results of Cross-City Gneralization

Results on High L2 Scenarios

Results on High Collision Rate Scenarios

Summary (Average over 3 seconds)

Competition Details

Score Comparison

Score Diff Comparison

Win Comparison

Comments

Comments powered by Giscus