GithubHelp home page GithubHelp logo

tanjingme / cobevt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from derrickxunu/cobevt

0.0 0.0 0.0 61.43 MB

[CoRL2022] CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

License: Apache License 2.0

Python 99.11% Cython 0.89%

cobevt's Introduction

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers [CORL2022]

paper supplement video

This is the official implementation of CoRL2022 paper "CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers". Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma

UCLA, UT-Austin


Overview of CoBEVT

Introduction

CoBEVT is the first generic multi-agent multi-camera perception framework that can cooperatively generate BEV map predictions. The core component of CoBEVT, named fused axial attention or FAX module, can capture sparsely local and global spatial interactions across views and agents. We achieve SOTA performance both on OPV2V and nuScenes dataset with real-time performance.


nuScenes demo: Our CoBEVT can be used on single-vehicle multi-camera semantic BEV Segmentations.


OPV2V demo: Our CoBEVT can also be used for multi-agent BEV map prediction.

Installation

The pipeline for nuScenes dataset and OPV2V dataset is different. Please refer to the specific folder for more details based on your research purpose.

๐Ÿ‘‰ nuScenes Users
๐Ÿ‘‰ OPV2V Users

Models

Fused Axial Attention Module (FAX) (click to expand)
SinBEVT (single-agent multi-view fusion) and FuseBEVT (multi-agent BEV fusion) (click to expand)
CoBEVT Architecture (click to expand)

Results

Main results (OPV2V-camera, -LiDAR, and nuScenes.) (click to expand)
Qualitative results on OPV2V-camera (click to expand)
Qualitative results on OPV2V-LiDAR (click to expand)
Qualitative results on nuScenes (click to expand)
Ablation study (click to expand)

Citation

@inproceedings{xu2022cobevt,
 author = {Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma},
 title = {CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers},
 booktitle={Conference on Robot Learning (CoRL)},
 year = {2022}}
@article{xu2022v2x,
 title={V2X-ViT: Vehicle-to-everything cooperative perception with vision transformer},
 author={Xu, Runsheng and Xiang, Hao and Tu, Zhengzhong and Xia, Xin and Yang, Ming-Hsuan and Ma, Jiaqi},
 journal={Proceedings of the European Conference on Computer Vision (ECCV)},
 year={2022}
}
@inproceedings{tu2022maxim,
 title={Maxim: Multi-axis mlp for image processing},
 author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
 booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
 pages={5769--5780},
 year={2022}
}
@article{tu2022maxvit,
 title={Maxvit: Multi-axis vision transformer},
 author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
 journal={Proceedings of the European Conference on Computer Vision (ECCV)},
 year={2022}
}

Acknowledgement

CoBEVT is build upon OpenCOOD, which is the first Open Cooperative Detection framework for autonomous driving.

Our nuScenes experiments used the training pipeline in CVT(CVPR2022).

CoBEVT is partly inspired by V2X-ViT, MAXIM and MaxViT.

cobevt's People

Contributors

derrickxunu avatar vztu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.