GithubHelp home page GithubHelp logo

ale9806 / fomo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from orrzohar/fomo

0.0 0.0 0.0 2.52 MB

Official Pytorch code for Open World Object Detection in the Era of Foundation Models

License: Apache License 2.0

Shell 6.59% Python 76.03% Jupyter Notebook 17.38%

fomo's Introduction

Open World Object Detection in the Era of Foundation Models

arXiv website

If you like our project, please give us a star โญ on GitHub for latest updates!

๐Ÿ“ฐ News

  • [2024.1.5] Initial release of the RWD dataset. I will be updating the arXiv after a bug was found, causing some variations to the original numbers.

๐Ÿ”ฅ Highlights

The proposed Real-World Object Detection (RWD) benchmark consists of five real-world, application-driven datasets:

FOMO is a novel approach in Open World Object Detection (OWOD), harnessing foundation models to detect unknown objects by their shared attributes with known classes. It generates and refines attributes using language models and known class exemplars, enabling effective identification of novel objects. Benchmarked on diverse real-world datasets, FOMO significantly outperforms existing baselines, showcasing the potential of foundation models in complex detection tasks.

prob

๐Ÿ› ๏ธ Requirements and Installation

We have trained and tested our models on Ubuntu 20.04, CUDA 12.2, Python 3.7.16

conda create --name fomo python==3.7.16
conda activate fomo
pip install -r requirements.txt
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia

๐Ÿ“ธ Dataset Setup

Dataset setup instruction is in DATASET_SETUP.md.

๐Ÿ—๏ธ Training and Evaluation

*Note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running chmod +x *.sh in each directory. To run the OWOD baselines, use the configurations defined in \configs:

OWOD

  1. run_owd.sh - evaluation of tasks 1-4 on the SOWOD/MOWOD Benchmark.
  2. run_owd_baseline.sh - evaluation of tasks 1-4 on the SOWOD Benchmark.

RWD

To run FOMO:

  1. run_rwd.sh - evaluation of all datasets on task 1 RWD Benchmark.
  2. run_rwd_t2.sh - evaluation of all datasets on task 2 RWD Benchmark.

To run baselines: 3. run_rwd_baselines.sh - evaluation of all datasets on task 1 RWD Benchmark. 4. run_rwd_t2_baselines.sh - evaluation of all datasets on task 2 RWD Benchmark. 5. run_rwd_fs_baseline.sh - evaluation of the few-shot baseline on all datasets on task 1 RWD Benchmark. 6. run_rwd_t2_fs_baseline.sh - evaluation of the few-shot baseline on all datasets on task 2 RWD Benchmark.

Note: Please check the Deformable DETR repository for more evaluation details.

โœ๏ธ Citation

If you find our paper and code useful in your research, please consider giving a star โญ and citation ๐Ÿ“.

@InProceedings{zohar2023open,
    author    = {Zohar, Orr and Lozano, Alejandro and Goel, Shelly and Yeung, Serena and Wang, Kuan-Chieh},
    title     = {Open World Object Detection in the Era of Foundation Models},
    booktitle = {arXiv preprint arXiv:2312.05745},
    year      = {2023},
}

๐Ÿ“ง Contact

Should you have any questions, please contact ๐Ÿ“ง [email protected]

๐Ÿ‘ Acknowledgements

FOMO builds on other code bases such as:

  • PROB - PROB: Probabilistic Objectness for Open World Obejct Detection codebase.
  • OWL-ViT - The Transformer's library implementation of OWL-ViT.

If you found FOMO useful please consider citing these works:

@InProceedings{Zohar_2023_CVPR,
    author    = {Zohar, Orr and Wang, Kuan-Chieh and Yeung, Serena},
    title     = {PROB: Probabilistic Objectness for Open World Object Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {11444-11453}
}
@article{minderer2022simple,
    title   = {Simple Open-Vocabulary Object Detection with Vision Transformers},
    author  = {Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
    journal = {ECCV},
    year    = {2022},
}

โœจ Star History

Star History Chart

fomo's People

Contributors

orrzohar avatar ale9806 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.