GithubHelp home page GithubHelp logo

wayveai / driving-with-llms Goto Github PK

View Code? Open in Web Editor NEW
394.0 16.0 34.0 317.27 MB

PyTorch implementation for the paper "Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving"

License: Apache License 2.0

Python 100.00%

driving-with-llms's Introduction

banner

This is the PyTorch implementation for inference and training of the LLM-Driver described in:

Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

Long Chen, Oleg Sinavski, Jan Hünermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, Jamie Shotton

ICRA 2024
[preprint] [arxiv]

LLM-Driver
The LLM-Driver utilises object-level vector input from our driving simulator to predict explanable actions using pretrained Language Models, providing a robust and interpretable solution for autonomous driving. LLM-Driver
The LLM-Driver running in open-loop prediction using the vector inputs (top-left BEV view), with the results of action prediction (steering angles and acceleration/brake pedals), action justification (captions on the rendered video), Driving Question Answering (table at the bottom).

News

  • [2024/01/29] Thrilled to share that our paper has been accepted by ICRA 2024!
  • [2023/12/21] Please checkout our follow-up work LingoQA: [code] [arxiv]
  • [2023/10/03] The paper is now avaliable on [arxiv]
  • [2023/07/06] The paper and code have been made available under the paper_code branch for anonymous submission.

Getting Started

Prerequisites

  • Python 3.x
  • pip
  • Minimum of 20GB VRAM for running evaluations
  • Minimum of 40GB VRAM for training (default setting)

⚙ Setup

  1. Set up a virtual environment (tested with Python 3.8-3.11)

    python3 -m venv env
    source env/bin/activate
  2. Install required dependencies

    pip install -r requirements.txt.lock

Note: requirements.txt.lock is generated with pip-compile from original requirements.txt for reproducibility.

  1. Set up WandB API key

    Set up your WandB API key for training and evaluation logging.

    export WANDB_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

💿 Dataset

  • Training/testing data: The datasets have already been checked into the codebase. To unarchive them, use the following commands:

    tar -xzvf data/vqa_train_10k.tar.gz -C data/
    tar -xzvf data/vqa_test_1k.tar.gz -C data/
    
  • Re-collect DrivingQA data: While the training and evaluation datasets already include pre-collected DrivingQA data, we also offer a script that illustrates how to collect DrivingQA data using the OpenAI ChatGPT API. If you wish to re-collect the DrivingQA data, simplely run the following command with your OpenAI API key:

    python scripts/collect_vqa.py -i data/vqa_test_1k.pkl -o output_folder/ --openai_api xxxxxxxx

🏄 Evaluation

  1. Evaluate for Perception and Action Prediction

    Run the following command:

    python train.py \
        --mode eval \
        --resume_from_checkpoint models/weights/stage2_with_pretrained/ \
        --data_path data/vqa_train_10k.pkl \
        --val_data_path data/vqa_test_1k.pkl \
        --eval_items caption,action \
        --vqa
  2. Evaluate for DrivingQA

    Run the following command:

    python train.py \
        --mode eval \
        --resume_from_checkpoint models/weights/stage2_with_pretrained/ \
        --data_path data/vqa_train_10k.pkl \
        --val_data_path data/vqa_test_1k.pkl \
        --eval_items vqa \
        --vqa
  3. View Results

    The results can be viewed on the WandB project "llm-driver".

  4. Grade DrivingQA Results with GPT API

    To grade the results with GPT API, run the following command:

    python scripts/grade_vqa.py \
        -i data/vqa_test_1k.pkl \
        -o results/10k_ft.pkl \
        -r results/10k_ft.json \
        --openai_api xxxxxxxx

    Replace the results/10k_ft.json with the val_results.table.json downloaded from WandB to grade your results.

🏊 Training

  1. Run LLM-Driver Training

    Execute the following command to start training:

    python train.py \
        --mode train \
        --eval_steps 50 \
        --val_set_size 32 \
        --num_epochs 5 \
        --resume_from_checkpoint models/weights/stage1_pretrained_model/ \
        --data_path data/vqa_train_10k.pkl \
        --val_data_path data/vqa_test_1k.pkl \
        --vqa
  2. Follow the previous section for evaluating LLM-Driver

  3. [optional] Train and evaluate Perceiver-BC

    Execute the following command to start training and evaluation:

    python train_bc.py \
        --num_epochs 25 \
        --data_path data/vqa_train_10k.pkl \
        --val_data_path data/vqa_test_1k.pkl

📝 Citation

If you find our work useful in your research, please consider citing:

@inproceedings{chen2024drivingwithllms,
  title={Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving},
  author={Long Chen and Oleg Sinavski and Jan Hünermann and Alice Karnsund and Andrew James Willmott and Danny Birch and Daniel Maund and Jamie Shotton},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2024}
}
@article{marcu2023lingoqa,
  title={LingoQA: Video Question Answering for Autonomous Driving}, 
  author={Ana-Maria Marcu and Long Chen and Jan Hünermann and Alice Karnsund and Benoit Hanotte and Prajwal Chidananda and Saurabh Nair and Vijay Badrinarayanan and Alex Kendall and Jamie Shotton and Oleg Sinavski},
  journal={arXiv preprint arXiv:2312.14115},
  year={2023},
}

🙌 Acknowledgements

This project has drawn inspiration from the Alpaca LoRA repository. We would like to express our appreciation for their contributions to the open-source community.

driving-with-llms's People

Contributors

melights avatar olegsinavski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

driving-with-llms's Issues

Dataset and LLM

Hello, your work is very exciting. First of all, I would like to know if your model contains a control signal output. Does the dataset contain a control signal? Second, if I want to replace the LLM with the pre-trained model of the first stage, how can I do it?

The inference generation is very slow

The inference process is currently quite slow. Are there any methods available to accelerate it?
For action task, it costs about 9s for a sample.

Are Driving with LLMs and LingoQA related?

Hello! Amazing work, really.
I was wondering whether this work and the consecutive one (LingoQA) are comparable and if it a real improvement of this work or just another approach.
Since it seems more of a driving scenario commentator and less as a control model that predicts future control signals.
Thank you!

decapoda-research/llama-7b-hf/ Not Found

Thank you for your very nice work!
I was trying to run the program and realized that the pre-trained model is not downloadable from hugging face
decapoda-research/llama-7b-hf/
Can you provide an alternative link for that model?

Paper citations

It is very related to your paper being ICRA 2024 cited, I hope you can have more and better results, but I don't seem to find the citations of this paper on the Internet, all of them are arxiv, do you have a citation in the bib file format of the ICRA2024,Thanks again!

View Results

thanks for your great work!
I want to view the results after evaluation, where to find the WandB project "llm-driver"?

decapoda-research/llama-7b-hf

decapoda-research/llama-7b-hf is no longer accessible on HuggingFace。
I have a bug with “baffo32/decapoda-research-llama-7B-hf”:
ValueError: The device_map provided does not give any device for the following parameters: base_model.model.weighted_mask

real vehicles

Hi,Thanks for your great work! I would like to inquire, based on your experience and research, do you believe it is feasible to deploy this work onto real vehicles? Particularly, considering our current computational resources are 2 A800 GPUs (80G). In this scenario, how long do you think it might take us to achieve this goal? Are there any key technical challenges or issues that need to be addressed?

Training with real world datasets

Hello, congrats on your work!

I was curious to know if you thought about including the real-world datasets as part of fine-tuning the LLM for decision making?

The environment

I'm grateful for your contributing work. Could you please introduce your setup like CUDA and cudnn? I have tried many times to run the code 'pip install -r requirements.txt.lock' but there are many different problems and errors.

Thanks for your work.I have a question. Since the prompt in the model's input already describes the content represented by the vectors, why is it necessary to align the vectors with the LLM during the pre-training process? Are the vectors used to help the model further understand the driving scenario based on the prompt? Are the labels in the pre-training process the prompts generated by lanGen? What is the purpose of the 100k question-answer pairs in the pre-training process?

Could you explain what the model's input and the corresponding labels are during the pre-training stage to help me better understand this process?

does anyone meet 'Parameter' object has no attribute 'CB'?

Hi, I tried to install the requirement file as the author listed but it first raised the error of bitsandbytes not supporting GPU, I solved this by installing bitsandbytes==0.43.3. After that there is an error of 'Parameter' object has no attribute 'CB'. Does anyone have solution?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.