ELM: Embodied Understanding of Driving Scenarios

Revive driving scene understanding by delving into the embodiment philosophy

Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, and Hongyang Li

Presented by OpenDriveLab and Shanghai AI Lab

📬 Primary contact: Yunsong Zhou ( [email protected] )

arXiv paper | Blog TODO | Slides

CVPR 2024 Autonomous Driving Challenge - Driving with Language

Highlights

🔥 The first embodied language model for understanding the long-horizon driving scenarios in space and time.

🌟 ELM expands a wide spectrum of new tasks to fully leverage the capability of large language models in an embodiment setting and achieves significant improvements in various applications.

🏆 Interpretable driving model, on the basis of language prompting, will be a main track in the CVPR 2024 Autonomous Driving Challenge. Please stay tuned for further details!

News

🔥 Interpretable driving model is launched. Please refer to the link for more details.
[2024/03] ELM paper released.
[2024/03] ELM code and data initially released.

Highlights
News
TODO List
Installation
Dataset
Training and Inference
License and Citation
Related Resources

TODO List

Release fine-tuning code and data
Release reference checkpoints
Toolkit for label generation

Installation

(Optional) Creating conda environment

conda create -n elm python=3.8
conda activate elm

install from PyPI

pip install salesforce-lavis

Or, for development, you may build from source

git clone https://github.com/OpenDriveLab/ELM.git
cd ELM
pip install -e .

Dataset

Pre-training data. We collect driving videos from YouTube, nuScenes, Waymo, and Ego4D. Here we provide a sample of 🔗 YouTube video list we used. For privacy considerations, we are temporarily keeping the complete data labels private.

Fine-tuning data. The full set of question and answer pairs for the benchmark can be obtained through this 🔗data link. You may need to download the corresponding image data from the official nuScenes and Ego4D channels. For a quick verification of the pipeline, we recommend downloading the subset dataset of DriveLM and organizing the data in line with the format.

Please make sure to soft link nuScenes and ego4d datasets under data/xx folder. You may need to run tools/video_clip_processor.py to pre-process data first. Besides, we provide some script used during auto-labeling, you may use these as a reference if you want to customize data.

Training

# you can modify the lavis/projects/blip2/train/advqa_t5_elm.yaml
bash scripts/train.sh

Inference

Modify the advqa_t5_elm.yaml to enable the evaluate as True.

bash scripts/train.sh

For the evaluation of generated answers, please use the script in scripts/qa_eval.py.

python scripts/qa_eval.py <data_root> <log_name>

License and Citation

All assets and code in this repository are under the Apache 2.0 license unless specified otherwise. The language data is under CC BY-NC-SA 4.0. Other datasets (including nuScenes and Ego4D) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.

@article{zhou2024embodied,
  title={Embodied Understanding of Driving Scenarios},
  author={Zhou, Yunsong and Huang, Linyan and Bu, Qingwen and Zeng, Jia and Li, Tianyu and Qiu, Hang and Zhu, Hongzi and Guo, Minyi and Qiao, Yu and Li, Hongyang},
  journal={arXiv preprint arXiv:2403.04593},
  year={2024}
}

Related Resources

We acknowledge all the open-source contributors for the following projects to make this work possible:

Lavis | DriveLM

DriveAGI | Survey on BEV Perception | Survey on E2EAD
UniAD | OpenLane-V2 | OccNet | OpenScene

pastapeter / elm Goto Github PK

elm's Introduction

ELM: Embodied Understanding of Driving Scenarios

Highlights

News

Table of Contents

TODO List

Installation

Dataset

Training

Inference

License and Citation

Related Resources

elm's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs