GithubHelp home page GithubHelp logo

vip3d's Introduction

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries (CVPR 2023)

  • This is the official repository of the paper: ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries (CVPR 2023).

Installation

Use the following commands to prepare the python environment.

1) Create conda environment

conda create -n vip3d python=3.6

Supported python versions are 3.6, 3.7, 3.8.

2) Install pytorch

conda activate vip3d
pip install torch==1.10+cu111 torchvision==0.11.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html

3) Install mmcv, mmdet

pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10/index.html
pip install mmdet==2.24.1

4) Install other packages

pip install -r requirements.txt

5) Install mmdet3d

cd ~
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install / pip install -e .
pip install -r requirements/runtime.txt  # Install packages for mmdet3d

Quick start with Docker (Optional)

We also provide a docker image of ViP3D, which has installed all required packages. The docker image is built from NVIDIA container image for PyTorch. Make sure you have installed docker and nvidia docker.

docker pull gentlesmile/vip3d
docker run --name vip3d_container -it --gpus all --ipc=host gentlesmile/vip3d

Prepare Dataset

1) Download nuScenes full dataset (v1.0) and map expansion here.

Only need to download Keyframe blobs and Radar blobs.

2) Structure

After downloading, the structure is as follows:

ViP3D
├── mmdet3d/
├── plugin/
├── tools/
├── data/
│   ├── nuscenes/
│   │   ├── maps/
│   │   ├── samples/
│   │   ├── v1.0-trainval/
│   │   ├── lidarseg/

3) Prepare data infos

Suppose nuScenes data is saved at data/nuscenes/.

python tools/data_converter/nusc_tracking.py

Training and Evaluation

Training

Train ViP3D using 3 historical frames and the ResNet50 backbone. It will load a pre-trained detector for weight initialization. Suppose the detector is at ckpts/detr3d_resnet50.pth. It can be downloaded from here.

bash tools/dist_train.sh plugin/configs/vip3d_resnet50_3frame.py 8 --work-dir=work_dirs/vip3d_resnet50_3frame.1
python tools/train.py plugin/hivt/configs/hivt_resnet50_3frame.py --work-dir=hivt_results.1 --gpus=1 

The training stage requires ~ 17 GB GPU memory, and takes ~ 3 days for 24 epochs on 8× 3090 GPUS.

Evaluation

Run evaluation using the following command:

PYTHONPATH=. python tools/test.py plugin/vip3d/configs/vip3d_resnet50_3frame.py work_dirs/vip3d_resnet50_3frame.1/epoch_24.pth --eval bbox

python tools/test.py plugin/vip3d/configs/vip3d_resnet50_3frame.py ./ckpts/epoch_24.pth --eval bbox

The checkpoint epoch_24.pth can be downloaded from here.

Expected AMOTA using ResNet50 as backbone: 0.291

Then test prediction metrics:

unzip ./nuscenes_prediction_infos_val.zip
python tools/prediction_eval.py --result_path 'work_dirs/vip3d_resnet50_3frame.1/results_nusc.json'

Expected results: minADE: 1.47, minFDE: 2.21, MR: 0.237, EPA: 0.245

License

The code and assets are under the Apache 2.0 license.

Citation

If you find our work useful for your research, please consider citing the paper:

@inproceedings{vip3d,
  title={ViP3D: End-to-end visual trajectory prediction via 3d agent queries},
  author={Gu, Junru and Hu, Chenxu and Zhang, Tianyuan and Chen, Xuanyao and Wang, Yilun and Wang, Yue and Zhao, Hang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5496--5506},
  year={2023}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.