GithubHelp home page GithubHelp logo

deepphysicvision / emiff Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bosszhe/emiff

0.0 0.0 0.0 8.54 MB

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

License: Apache License 2.0

Shell 0.19% Python 94.41% Jupyter Notebook 5.40%

emiff's Introduction

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang.ICRA 2024.

This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for EMIFF/VIMI.

Abstract

In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Currently, two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection: $1)$ inherent pose errors when fusing multi-view images, caused by time asynchrony across cameras; $2)$ information loss in transmission process resulted from limited communication bandwidth. To address these issues, we propose a novel camera-based 3D detection framework for VIC3D task, Enhanced Multi-scale Image Feature Fusion (EMIFF). To fully exploit holistic perspectives from both vehicles and infrastructure, we propose Multi-scale Cross Attention (MCA) and Camera-aware Channel Masking (CCM) modules to enhance infrastructure and vehicle features at scale, spatial, and channel levels to correct the pose error introduced by camera asynchrony. We also introduce a Feature Compression (FC) module with channel and spatial compression blocks for transmission efficiency. Experiments show that EMIFF achieves SOTA on DAIR-V2X-C datasets, significantly outperforming previous early-fusion and late-fusion methods with comparable transmission costs.

Methods

Architecture

Get Started

Benchmark and Model Zoo

Modality:Image

Fusion Method Dataset AP-3D (IoU=0.5) AP-BEV (IoU=0.5) Config DownLoad
Only-Veh ImvoxelNet VIC-Sync 7.29 8.85 config \
Only-Inf ImvoxelNet VIC-Sync 8.66 14.41 config \
Late-Fusion ImvoxelNet VIC-Sync 11.08 14.76 \ \
Early-Fusion BEVFormer_S VIC-Sync 8.80 13.45 config model/log
Early-Fusion ImVoxelNet VIC-Sync 12.72 18.17 config model/log
Intermediate-Fusion EMIFF VIC-Sync 15.61 21.44 config model/log

We evaluate Only-Veh/Only-Inf/Late-Fusion model following OpenDAIRV2X.

Acknowledgement

This project is not possible without the following codebases.

Citation

If you find our work useful in your research, please consider citing:

@misc{wang2023vimi,
      title={VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection}, 
      author={Zhe Wang and Siqi Fan and Xiaoliang Huo and Tongda Xu and Yan Wang and Jingjing Liu and Yilun Chen and Ya-Qin Zhang},
      year={2023},
      eprint={2303.10975},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@inproceedings{wang2024emiff,
      title={EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection}, 
      author={Zhe Wang and Siqi Fan and Xiaoliang Huo and Tongda Xu and Yan Wang and Jingjing Liu and Yilun Chen and Ya-Qin Zhang},
      booktitle = {2024 IEEE International Conference on Robotics and Automation (ICRA)},
      year = {2024}}
}

emiff's People

Contributors

bosszhe avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.