GithubHelp home page GithubHelp logo

elm's Introduction

ELM: Embodied Understanding of Driving Scenarios

Revive driving scene understanding by delving into the embodiment philosophy

ELM: v1.0 License: Apache2.0

Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, and Hongyang Li

Highlights

๐Ÿ”ฅ The first embodied language model for understanding the long-horizon driving scenarios in space and time.

๐ŸŒŸ ELM expands a wide spectrum of new tasks to fully leverage the capability of large language models in an embodiment setting and achieves significant improvements in various applications.

method

๐Ÿ† Interpretable driving model, on the basis of language prompting, will be a main track in the CVPR 2024 Autonomous Driving Challenge. Please stay tuned for further details!

News

  • ๐Ÿ”ฅ Interpretable driving model is launched. Please refer to the link for more details.
  • [2024/03] ELM paper released.
  • [2024/03] ELM code and data initially released.

Table of Contents

  1. Highlights
  2. News
  3. TODO List
  4. Installation
  5. Dataset
  6. Training and Inference
  7. License and Citation
  8. Related Resources

TODO List

  • Release fine-tuning code and data
  • Release reference checkpoints
  • Toolkit for label generation

Installation

  1. (Optional) Creating conda environment
conda create -n elm python=3.8
conda activate elm
  1. install from PyPI
pip install salesforce-lavis
  1. Or, for development, you may build from source
git clone https://github.com/OpenDriveLab/ELM.git
cd ELM
pip install -e .

Dataset

Pre-training data. We collect driving videos from YouTube, nuScenes, Waymo, and Ego4D. Here we provide a sample of ๐Ÿ”— YouTube video list we used. For privacy considerations, we are temporarily keeping the complete data labels private.

Fine-tuning data. The full set of question and answer pairs for the benchmark can be obtained through this ๐Ÿ”—data link. You may need to download the corresponding image data from the official nuScenes and Ego4D channels. For a quick verification of the pipeline, we recommend downloading the subset dataset of DriveLM and organizing the data in line with the format.

Please make sure to soft link nuScenes and ego4d datasets under data/xx folder. You may need to run tools/video_clip_processor.py to pre-process data first. Besides, we provide some script used during auto-labeling, you may use these as a reference if you want to customize data.

Training

# you can modify the lavis/projects/blip2/train/advqa_t5_elm.yaml
bash scripts/train.sh

Inference

Modify the advqa_t5_elm.yaml to enable the evaluate as True.

bash scripts/train.sh

For the evaluation of generated answers, please use the script in scripts/qa_eval.py.

python scripts/qa_eval.py <data_root> <log_name>

License and Citation

All assets and code in this repository are under the Apache 2.0 license unless specified otherwise. The language data is under CC BY-NC-SA 4.0. Other datasets (including nuScenes and Ego4D) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.

@article{zhou2024embodied,
  title={Embodied Understanding of Driving Scenarios},
  author={Zhou, Yunsong and Huang, Linyan and Bu, Qingwen and Zeng, Jia and Li, Tianyu and Qiu, Hang and Zhu, Hongzi and Guo, Minyi and Qiao, Yu and Li, Hongyang},
  journal={arXiv preprint arXiv:2403.04593},
  year={2024}
}

Related Resources

We acknowledge all the open-source contributors for the following projects to make this work possible:

Twitter Follow

elm's People

Contributors

zhouyunsong avatar devlinyan avatar eltociear avatar retsuh-bqw avatar faikit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.