GithubHelp home page GithubHelp logo

salesforce / pb-ovd Goto Github PK

View Code? Open in Web Editor NEW
49.0 5.0 6.0 2.42 MB

A pytorch Implementation of Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

License: BSD 3-Clause "New" or "Revised" License

Python 83.00% Shell 0.07% C 1.15% C++ 2.15% Cuda 13.64%

pb-ovd's Introduction

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

Introduction

This is an official pytorch implementation of Open Vocabulary Object Detection with Pseudo Bounding-Box Labels. network

Environment

UBUNTU="18.04"
CUDA="11.0"
CUDNN="8"

Installation

conda create --name ovd

conda activate ovd

cd $INSTALL_DIR

bash ovd_install.sh

git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

cd ../
cuda_dir="maskrcnn_benchmark/csrc/cuda"
perl -i -pe 's/AT_CHECK/TORCH_CHECK/' $cuda_dir/deform_pool_cuda.cu $cuda_dir/deform_conv_cuda.cu
python setup.py build develop

Data Preparation

Inference

python -m torch.distributed.launch --nproc_per_node=8 tools/test_net.py \
--config-file configs/eval.yaml \
MODEL.WEIGHT $PATH_TO_FINAL_MODEL \
OUTPUT_DIR $OUTPUT_DIR
  • For LVIS, use their official API to get evaluated numbers
python evaluate_lvis_official.py --coco_anno_path datasets/lvis_v0.5_val_all_clipemb.json \
--result_dir $OUTPUT_DIR/inference/lvis_v0.5_val_all_cocostyle/

Pretrain with Pseudo Labels

python -m torch.distributed.launch --nproc_per_node=16 tools/train_net.py  --distributed \
--config-file configs/pretrain_1m.yaml \
OUTPUT_DIR $OUTPUT_DIR

Finetune

python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py  --distributed \
--config-file configs/finetune.yaml \
MODEL.WEIGHT $PATH_TO_PRETRAIN_MODEL \
OUTPUT_DIR $OUTPUT_DIR

Generate Your Own Pseudo Box Labels

examples

Installation

conda create --name gen_plabels

conda activate gen_plabels

bash gen_plabel_install.sh

Preparation

Generate Pseudo Labels

  • Get pseudo labels based on ALBEF
python pseudo_bbox_generation.py
  • Organize dataset in COCO format
python prepare_coco_dataset.py
  • Extract text embedding using CLIP
# pip install git+https://github.com/openai/CLIP.git

python prepare_clip_embedding_for_open_vocab.py
  • Check your final pseudo labels by visualization
python visualize_coco_style_dataset.py

Citation

  • If you find this code helpful, please cite our paper:
@article{gao2021towards,
  title={Open Vocabulary Object Detection with Pseudo Bounding-Box Labels},
  author={Gao, Mingfei and Xing, Chen and Niebles, Juan Carlos and Li, Junnan and Xu, Ran and Liu, Wenhao and Xiong, Caiming},
  journal={arXiv preprint arXiv:2111.09452},
  year={2021}
}

Contact

Notes

pb-ovd's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pb-ovd's Issues

How to generate proposals pkl files?

Hi,

I'm confused that:
How to generate proposals containing (numpy.ndarray) of a certain category in the proposal detector? (the *.pkl and *_infor.pkl)

Thanks

error loading the model

Hi,

I get this error when trying to load the model :

magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

I had to change some imports from transformers to match the new version of transformers, and change PY3 references to PY37.

Error! Packages conflicts!

Description

When I runovd_install.sh, the packages in requirements.txt conflict.
The transformers 3.4.0 requires tqdm>=4.27, while the torchvision 0.2.2 requires tqdm==4.19.9.

Solution

How to solve it? By change ovd_install.sh like this?
from

for line in $(cat requirements.txt)
do
  pip install $line
done

to

for line in $(cat requirements.txt)
do
  conda install $line
done

Visualization of Activation Map?

Dear Sir:
Your team did a great job. And Thank you for opening the source code. I recently do research on you project, I see the Activation Map in your paper. Can you provide the code of this visualization of this part?
Thanks again.

Pseudo Label Generation in unsupervised proposal generator (Selective Search)

Dear author, Thanks for your work! I'm interested in the Pseudo Label Generation part.
I have generated Pseudo Label with supervised proposal generator MaskRCNN with ResNet50 train on COCO2017. We get good results with supervised proposal generator:

000000444028

I wonder how I can reproduce the results in an unsupervised proposal generator (Selective Search)? Can you release this part of the code? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.