Large Language Models Powered Context-aware Motion Prediction

Xiaoji Zheng, Lixiu Wu, Zhijie Yan, Yuanrong Tang, Hao Zhao, Chen Zhong, Bokui Chen, Jiangtao Gong

Institution for AI Industry Research, Tsinghua University

Southeast University

IROS 2024

Static Badge Static Badge

Abstract

Motion prediction is among the most fundamental tasks in autonomous driving. Traditional methods of motion forecasting primarily encode vector information of maps and historical trajectory data of traffic participants, lacking a comprehensive understanding of overall traffic semantics, which in turn affects the performance of prediction tasks. In this paper, we utilized Large Language Models (LLMs) to enhance the global traffic context understanding for motion prediction tasks. We first conducted systematic prompt engineering, visualizing complex traffic environments and historical trajectory information of traffic participants into image prompts—Transportation Context Map (TC-Map), accompanied by corresponding text prompts. Through this approach, we obtained rich traffic context information from the LLM. By integrating this information into the motion prediction model, we demonstrate that such context can enhance the accuracy of motion predictions. Furthermore, considering the cost associated with LLMs, we propose a cost-effective deployment strategy: enhancing the accuracy of motion prediction tasks at scale with 0.7% LLM augmented datasets. Our research offers valuable insights into enhancing the understanding of traffic scenes of LLMs and the motion prediction performance of autonomous driving.

LLM-Augmented MTR

main idea of our work

We start by converting structured data from a motion prediction dataset into images that resemble bird's-eye views, termed Transportation Context Maps (TC-Maps). These TC-Maps, along with our meticulously crafted text prompts, are then processed by GPT4-V, which outputs context information reflecting its comprehension of the scenario. This context information is subsequently incorporated into an encoder-decoder motion prediction framework.

Prompt Engineering

original scenario 1 for LLM

Caption 1. The ego_red_vehicle is driving in FORWARD lane, it may go STRAIGHT, go STRAIGHT-LEFT or go STRAIGHT-RIGHT in the next 8s. If there is a parking lot nearby, ego_red_vehicle may also make a LEFT-TURN or RIGHT-TURN in the next 8s. 2. The ego_red_vehicle's speed is 13.184m/s. The orange_vehicle is the closest to ego_red_vehicle, its speed is 12.427m/s, and it is in the lower left of the ego_red_vehicle about 5.057m away. The green_vehicle is the second closest to ego_red_vehicle, its speed is 11.281m/s, and it is in the lower right of the ego_red_vehicle about 6.832m away. The blue_vehicle is the third closest to ego_red_vehicle, its speed is 9.771m/s, and it is on the left of the ego_red_vehicle about 11.114m away. 3. The black map icon in the upper right corner of the 'Motion Prediction Map' identify the right, lower, left, and upper directions of the 'Motion Prediction Map', the default driving direction of ego_red_vehicle is 'upper'. White striped areas represent crosswalks. Determine the style of the image before answering the question. Please answer according to the FORMAT format based on the 'Motion Prediction Map' provided and the above information.

LLM's Response { "Situation Understanding": "The ego_red_vehicle labeled '0' is proceeding in the FORWARD lane with a speed of 13.184m/s, suggesting it is moving swiftly on a main road or highway. The black_trajectory indicates it has been maintaining a straight path. The orange_vehicle labeled '1' is slightly slower at 12.427m/s and is located in the lower left, which is behind and to the left of the ego vehicle, potentially preparing to overtake. The green_vehicle labeled '2' is also slower at 11.281m/s and positioned in the lower right, which is behind and to the right of the ego vehicle, indicating it is following at a safe distance. The blue_vehicle labeled '3' is further away on the left at 11.114m distance and traveling at 9.771m/s, which could indicate it is in a different lane or preparing to merge. The white striped areas suggest the presence of crosswalks, requiring additional vigilance for pedestrians.", "Reasoning": "Given the high speed of the ego_red_vehicle and its position in the FORWARD lane, it should primarily continue STRAIGHT, especially if there are no immediate hazards or intersections. If approaching a parking lot, the vehicle has the option to make a LEFT-TURN or RIGHT-TURN, but this should be done with caution considering the proximity of the orange and green vehicles. The speeds of the orange and green vehicles suggest they are not immediate threats, but the ego vehicle should be prepared to adjust its path if they attempt to overtake.", "Actions": "STRAIGHT, LEFT-TURN, RIGHT-TURN", "Affordance": "ACCELERATE-ALLOW, LEFT-ALLOW, RIGHT-ALLOW", "Scenario_name": "ON-STRAIGHT-ROAD, PARKING-LOT" }

Motion Prediction Improvement

Transportation Context (generated by cost-effective deployment strategy) { "track_intentions": [["STRAIGHT", "LEFT_TURN", "RIGHT_TURN"], ["STRAIGHT", "STRAIGHT_LEFT", "STRAIGHT_RIGHT"], ["STRAIGHT", "STRAIGHT_LEFT", "STRAIGHT_RIGHT"], ["STRAIGHT", "STRAIGHT_RIGHT", "RIGHT_TURN"], ["STRAIGHT", "STRAIGHT_LEFT", "STRAIGHT_RIGHT"], ["STRAIGHT", "STRAIGHT_RIGHT", "STRAIGHT_LEFT"], ["STRAIGHT", "STRAIGHT_LEFT", "STRAIGHT_RIGHT"], ["STRAIGHT", "STRAIGHT_LEFT", "STRAIGHT_RIGHT"]], "track_affordances": [["ACCELERATE_ALLOW", "LEFT_ALLOW", "RIGHT_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["ACCELERATE_ALLOW", "LEFT_ALLOW", "RIGHT_ALLOW"], ["ACCELERATE_ALLOW", "RIGHT_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"]], "track_scenarios": [["ON_ROADSIDE"], ["ON_STRAIGHT_ROAD"], ["ON_STRAIGHT_ROAD"], ["ON_STRAIGHT_ROAD"], ["INTERSECTION", "ON_STRAIGHT_ROAD"], ["INTERSECTION"], ["ON_STRAIGHT_ROAD"], ["INTERSECTION", "ON_STRAIGHT_ROAD"]] }

Transportation Context (generated by cost-effective deployment strategy) { "track_intentions": [["STRAIGHT", "LEFT_TURN", "RIGHT_TURN"], ["STRAIGHT", "STRAIGHT_RIGHT"], ["STRAIGHT", "STRAIGHT_LEFT", "STRAIGHT_RIGHT"], ["STRAIGHT", "STRAIGHT_LEFT", "LEFT_TURN"], ["STRAIGHT", "STRAIGHT_LEFT"], ["STRAIGHT", "STRAIGHT_LEFT", "STRAIGHT_RIGHT"], ["RIGHT_TURN", "STRAIGHT", "STRAIGHT_RIGHT"], ["RIGHT_TURN", "STRAIGHT", "STRAIGHT_RIGHT"]], "track_affordances": [["ACCELERATE_ALLOW", "LEFT_ALLOW", "RIGHT_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["ACCELERATE_ALLOW", "LEFT_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["ACCELERATE_ALLOW", "RIGHT_ALLOW"], ["SLOW_ALLOW", "RIGHT_ALLOW"]], "track_scenarios": [["ON_STRAIGHT_ROAD", "PARKING_LOT"], ["ON_STRAIGHT_ROAD"], ["ON_STRAIGHT_ROAD"], ["INTERSECTION"], ["ON_STRAIGHT_ROAD"], ["INTERSECTION"], ["INTERSECTION"], ["INTERSECTION"]] }

Transportation Context (generated by cost-effective deployment strategy) { "track_intentions": [["STRAIGHT", "LEFT_TURN", "RIGHT_TURN"], ["STRAIGHT", "STATIONARY"], ["STRAIGHT", "RIGHT_TURN"], ["STRAIGHT", "LEFT_TURN", "RIGHT_TURN"], ["STRAIGHT", "STRAIGHT_LEFT", "STRAIGHT_RIGHT"], ["STRAIGHT"], ["STRAIGHT", "STATIONARY"], ["STRAIGHT", "LEFT_TURN", "RIGHT_TURN"]], "track_affordances": [["ACCELERATE_ALLOW", "LEFT_ALLOW", "RIGHT_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["ACCELERATE_ALLOW", "RIGHT_ALLOW"], ["ACCELERATE_ALLOW", "LEFT_ALLOW", "RIGHT_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["ACCELERATE_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW"], ["SLOW_ALLOW", "ACCELERATE_ALLOW", "LEFT_ALLOW", "RIGHT_ALLOW"]], "track_scenarios": [["INTERSECTION"], ["INTERSECTION"], ["INTERSECTION"], ["PARKING_LOT"], ["INTERSECTION", "ON_ROADSIDE"], ["INTERSECTION"], ["INTERSECTION"], ["INTERSECTION"]] }

Failure Cases

Prompt

our propmt

Tools

Visualization Tools

Single Agent | Single Agent (Compare) | Multi Agent | Multi Agent (Compare)

WOMD LeaderBoard Submit Script

Waymo Submit

Tutorial

Prompt | Model

Bibtex

@misc{zheng2024large, title={Large Language Models Powered Context-aware Motion Prediction}, author={Xiaoji Zheng and Lixiu Wu and Zhijie Yan and Yuanrong Tang and Hao Zhao and Chen Zhong and Bokui Chen and Jiangtao Gong}, year={2024}, eprint={2403.11057}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Website template borrowed from VoxPoser