COIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

Overview

COIRL-AD introduces a unified framework that integrates imitation and reinforcement learning through a collaborative-competitive paradigm within latent world models. The approach enhances policy robustness, enables self-improvement beyond expert data, and achieves consistent performance across complex driving scenarios.

Figure 1. Overview of the COIRL-AD framework. CoIRL-AD adopts a dual-policy architecture that integrates imitation learning (IL) and reinforcement learning (RL) through a shared latent world model. In each iteration, the IL actor and RL actor are trained in parallel. The latent world model is learned during the IL phase and then used in the RL phase, where only the RL actor and critic are updated. For exploration, the RL actor samples multiple action sequences, predicts future states via the latent world model, and evaluates them with rule-based reward functions. The critic assigns advantages to each sequence based on the imagined trajectories and rewards. To promote interaction, a competitive learning mechanism exchanges knowledge between the IL and RL actors.