DRIVEARENA Icon DriveArena: A Closed-loop Generative Simulation
Platform for Autonomous Driving


Evaluate Driving Agents on Worldwide Roads !




1Shanghai Artificial Intelligence Laboratory2Zhejiang University3Shanghai Jiao Tong University
4Technical University of Munich5East China Normal University
Equal contribution, Corresponding author

For optimal experience, we recommend using Chrome on PC.
Videos and images are clickable for enlarged viewing. Large videos may load slowly.

SINGAPORE ONE-NORTH
106 FRAME
2Hz
BOSTON SEAPORT
118 FRAME
2Hz
BOSTON THOMAS PARK
200 FRAME
2Hz
CARLA TOWN05
26 FRAME
2Hz

Closed-loop simulations with UniAD in generated environments.


..... And support any worldwide street maps !

Abstract

This paper presents DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core components: Traffic Manager, a traffic simulator capable of generating realistic traffic flow on any worldwide street map, and World Dreamer, a high-fidelity conditional generative model with infinite autoregression. This powerful synergy empowers any driving agent capable of processing real-world images to navigate in DriveArena simulated environment. The agent perceives its surroundings through images generated by World Dreamer and output trajectories; then these trajectories are fed into Traffic Manager, achieving realistic interactions with other vehicles and producing a new scene layout. Finally, the latest scene layout is relayed back into World Dreamer, perpetuating the simulation cycle. This iterative process fosters closed-loop exploration within a highly realistic environment, providing a valuable platform for developing and evaluating driving agents across diverse and challenging scenarios. DriveArena signifies a substantial leap forward in leveraging generative image data for the driving simulation platform, opening insights for closed-loop autonomous driving.

DriveArena Architecture

Pipeline Image

The framework of our proposed DriveArena consists of two key components: a Traffic Manager functioning as the backend physical engine and a World Dreamer serving as the real-world image renderer. Unlike conventional approaches, our DriveArena does not rely on pre-built digital assets or reconstructed 3D road models. Instead, the Traffic Manager adapts to road networks of any city in OpenStreetMap (OSM) format, which can be directly downloaded from the internet. This flexibility enables closed-loop traffic simulations on diverse urban layouts.

The Traffic Manager in DriveArena receives ego trajectories output by the autonomous driving agent and manages the movement of all background vehicles. It utilizes explicit traffic flow generation algorithms and enables the generation of a wider range of uncommon and potentially unsafe traffic scenarios, while also facilitating real-time collision detection between vehicles.

The World Dreamer in DriveArena generates realistic camera images that precisely correspond to the Traffic Manager's output. It also allows for user-defined prompts to control various elements of the generated images, such as street view style, time of day, and weather conditions, enhancing the diversity of the generated scenes. Specifically, it employs a diffusion-based model that utilizes the current map and vehicle layouts as control conditions to produce surround-view images with cross-view and temporal consistency.

Generation Results based on nuScenes Data

The videos below present generation results using different text prompts on the same road network. The layout conditions are projected onto the surrounding images. The four sets of videos and text prompts exhibit significant differences in weather and lighting and can maintain their own styles during the continuous iteration process. Each image illustrates that the road structure and vehicles strictly adhere to the given control conditions while maintaining excellent consistency in the surrounding view.

"daytime, sunny, downtown, red buildings, cars......"
"daytime, rainy, suburban, low buildings, wet surface......“
"daytime, cloudy, nature, green trees....."
"night, clear, suburban, streetlights......"

Generation Results Using Different Road Networks as Input

We randomly select one frame of images from the nuScenes dataset as reference images, and choose three scenes from OSM and Carla. The inference are performed on them with DriveArena respectively. As demonstrated below, the generated vehicles and road networks conform closely to control conditions, demonstrating strong control capabilities. The style and weather of the generated pictures can also be consistent with the reference images.

MY ALT TEXT

Generation Results Based on nuPlan Roadmaps

We also performed the inference directly on nuPlan dataset with World Dreamer to validate the scalability. World Dreamer is fully trained on nuScenes dataset. The nuPlan data, on the other hand, originates from cities different from nuScenes and features varying camera numbers and parameters. We select 6 cameras with a similar layout to the nuScenes dataset, and nuPlan's camera parameters are employed to project object boxes and lane lines onto corresponding images as control conditions. As shown below, World Dreamer adheres well to these conditions and generated coherent images when deployed in new cities and even with novel camera configurations.

Zero-shot inference on nuPlan data.

Comparison with MagicDrive on CARLA Road Networks

We used both MagicDrive and our World Dreamer to generate realistic images on the same Carla road network, which road style differs significantly from that of nuScenes. Consequently, the performance of MagicDrive, is slightly inferior in these conditions. As indicated by the yellow arrow, MagicDrive struggles with generating curved roads and fitting wide roads accurately. DriveArena, however, can produce reasonable pictures that follow the road structure.

Closed-Loop Driving in Simulation

Vision-based autonomous driving approaches, such as UniAD, FusionAD and Drive Like A Human, are typically trained and evaluated on open-loop datasets. However, these algorithms lack the capability to generalize directly to simulators for closed-loop evaluation, which hinders the demonstration of their true performance potential. Therefore, we deploy UniAD with DriveArena to examine its true driving ability in closed-loop simulation.

Performance of UniAD in a closed-loop simulation with DriveArena.

Corner cases generated through planning failures of UniAD in the closed-loop simulation.

Long Multi-View Video Generation (DreamForge)

BibTeX

@article{yang2024drivearena,
  title={DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving}, 
  author={Xuemeng Yang and Licheng Wen and Yukai Ma and Jianbiao Mei and Xin Li and Tiantian Wei and Wenjie Lei and Daocheng Fu and Pinlong Cai and Min Dou and Botian Shi and Liang He and Yong Liu and Yu Qiao},
  journal={arXiv preprint arXiv:2408.00415},
  year={2024}
}