DiLu🐴: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

1 Shanghai AI Laboratory, Shanghai, China
2 Department of Computer Science, East China Normal University
3 The Chinese University of Hong Kong

Equal Contribution
Corresponding author

The DiLu framework excels at handling closed-loop driving tasks in challenging driving environments. This framework performs comprehensible reasoning processes and consistently makes safe and logical decisions.

Abstract

Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability. Drawing inspiration from the knowledge-driven nature of human driving, we explore the question of how to instill similar capabilities into autonomous driving systems and summarize a paradigm that integrates an interactive environment, a driver agent, as well as a memory component to address this question. Leveraging large language models with emergent abilities, we propose the DiLu framework, which combines a Reasoning and a Reflection module to enable the system to perform decision-making based on common-sense knowledge and evolve continuously. Extensive experiments prove DiLu's capability to accumulate experience and demonstrate a significant advantage in generalization ability over reinforcement learning-based methods. Moreover, DiLu is able to directly acquire experiences from real-world datasets which highlights its potential to be deployed on practical autonomous driving systems. To the best of our knowledge, we are the first to instill knowledge-driven capability into autonomous driving systems from the perspective of how humans drive.

Knowledge-driven Paradigm

The knowledge-driven paradigm for autonomous driving systems includes three components:
  1. An environment with which an agent can interact;
  2. A driver agent with recall, reasoning, and reflection abilities;
  3. A memory component to persist experiences;
In continuous evolution, the driver agent observes the environment, queries, and updates experiences from the memory component and performs decision-making.

Framework

Based on the knowledge-driven paradigm for autonomous driving systems introduced above, we propose a practical framework called DiLu.
DiLu consists of four core modules: Environment, Reasoning, Reflection, and Memory. In particular, the Reasoning module begins by observing the environment and obtaining descriptions of the current scenario. Concurrently, a prompt generator is employed to combine this scenario description with the few-shot experiences of similar situations, which retrieved from the Memory module. These prompts are then fed into an out-of-the-box Large Language Model (LLM), and the decision decoder make an action by decoding LLM's response.


Reasoning module


In the Reasoning module, we utilize the experiences derived from the Memory module and the common-sense knowledge of the LLM to perform decision-making for the current traffic scenario. Specifically, the reasoning procedure contains the following procedures:
  1. Encode the scenario by a descriptor;
  2. Recall several experience from the Memory module;
  3. Generate the prompt;
  4. Feed the prompt into the LLM;
  5. Decode the action from the LLM's response.


Reflection module



The Reflection module in DiLu is designed to enhance autonomous driving capabilities by continuously learning from past experiences. It records driving scenarios and decisions made by the Reasoning module during closed-loop driving tasks. Successful sessions enrich the Memory module with key decision frames, while sessions ending in hazards prompt the module to analyze mistakes, rectify decisions, and propose strategies for safer driving, all contributing to the system's improvement.

Results

4 lane with medium traffic

5 lane with medium traffic

5 lane with heavy traffic

Directly use experience from CitySim

The closed-loop driving task in Highway-env under different settings. The ego car is controlled by DiLu and the decision-making processes are fully text-based. Dilu learns to follow cars and make sensible lane changes from the past experience in Memory module.

See what's DiLu thinking:

Description of frame

Decisions generated by DiLu🐴

BibTeX

@misc{wen2023dilu,
    title={DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models}, 
    author={Licheng Wen and Daocheng Fu and Xin Li and Xinyu Cai and Tao Ma and Pinlong Cai and Min Dou and Botian Shi and Liang He and Yu Qiao},
    year={2023},
    eprint={2309.16292},
    archivePrefix={arXiv},
    primaryClass={cs.RO}
}

@misc{fu2023drive,
  title={Drive Like a Human: Rethinking Autonomous Driving with Large Language Models}, 
  author={Daocheng Fu and Xin Li and Licheng Wen and Min Dou and Pinlong Cai and Botian Shi and Yu Qiao},
  year={2023},
  eprint={2307.07162},
  archivePrefix={arXiv},
  primaryClass={cs.RO}
}