GithubHelp home page GithubHelp logo

gpv2's Introduction

Webly Supervised Concept Expansion for General Purpose Vision Models

This is the codebase for GPV 2 from our paper Webly Supervised Concept Expansion for General Purpose Vision Models. Code for the web 10k dataset is in a separate repo.

Installation

Code

Clone the repo with --recurse-submodules

git clone [email protected]:allenai/gpv2.git --recurse-submodules

Create conda environment

conda create -n gpv2 python=3.6 -y
conda activate gpv2

Install pytorch, I have been using pytorch 1.8.1, other versions might work but are not tested. For example on linux:

conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=11.2 -c pytorch -c conda-forge

but you might need to change that command depending on your operating system/gpu setup.

Finally, install libraries:

conda install -c cyclus java-jdk=8.45.14 -y  
pip3 install -r requirements.txt

Data

Download data for COCO, DCE, and Web as well as the pre-computed VinVL features for these datasets (note you cannot use the VinVL features provided by the VinVL authors since we need features for the cropped images and for the all-image boxes):

python gpv2/download_data.py 

The data is saved in the locations found in file_paths.py, by default source data is saved into ~/data/gpv while the features are stored in ./data-cache/precomputed-features/vinvl. The command lines args for the script can download particular subsets if you don't need everything.

Models

We have currently released three GPV 2 models:

  • With web: s3://ai2-prior-gpv/public/gpv2-models/gpv2
  • Without web: s3://ai2-prior-gpv/public/gpv2-models/gpv2-noweb
  • CC pre-training only (not fine-tuning): s3://ai2-prior-gpv/public/gpv2-models/cc-pretrained/

To download, use aws s3 cp with --recursive:

mkdir -p models
aws s3 cp --recursive s3://ai2-prior-gpv/public/gpv2-models/gpv2 models/gpv2

Training

The repo is currently setup to train the basic model on COCO data, training with web data will be added we complete the release process.

To train on devices 0 and 1 of your machine without web data:

python gpv2/experiments/train_gpv2.py --device 0 1 --task all --output_dir /path/to/output/dir

For debugging purposes I recommend using the --debug flag and reducing the number of devices and workers to 0 which will get you much faster startup times and better error messages:

python gpv2/experiments/train_gpv2.py --device 0 1 --task all --output_dir /path/to/output/dir --debug small

which will run the model on a small sample of the data and without complicated distributed training.

To run from our CC pre-trained checkpoint, download the cc-pretrained model and use the --init_from flag

python gpv2/experiments/train_gpv2.py --device 0 1 --task all --output_dir /path/to/output/dir --init_from models/cc-pretrained/r0/state-ep8.pth

Eval

Single Image

Run on a single image using run_on_image_id

python gpv2/eval/run_on_image_id.py model/gpv2 dce/test/nocaps/0003d84e0165d630.jpg "What is this?"

Here "What is this?" is the prompt and dce/test/nocaps/0003d84e0165d630.jpg is an image_id, not a filepath, that can be used to look up the needed VinVL features in the HDF5 feature files. Look at GpvDataset or DceDataset to see the format of the image_ids.

For a Dataset

To compute predictions for a dataset, use:

python gpv2/eval/compute_topn_predictions.py models/gpv2 --datasets dce-vqa --part val --eval --output_name default

The predictions for VQA will saved to models/gpv2/r0/eval/{dataest-name}--default, an evaluation file with the results will be saved there as eval.json.

See the command line flags on compute_topn_predictions to run on other datasets, or use multiple GPUs.

Test server evaluations

The script gpv2/eval/build_sumbmission_files.py will construct the submissions files needs to evaluate on the VQA test, COCO test and nocaps val/test server assuming the needed predictions for those models have already been saved using compute_topn_predictions.py.

Precomputing features for new images

GPV-2 uses VinVL pre-computed image features. If you want to run the model on a new dataset, you will need to pre-computed the image features for that dataset. We provide our script for doing this and getting results in a HDF5 file we can use, the results are compatibile with the ones produced by here. There are three steps to doing this:

  1. Gather your images into one directory, it may include subdirectories, but it should not contain any files other than images.

  2. Run:

    **python gpv2/build_image_features/precompute_image_features.py /path/to/image_directory your_dataset_name --output features.hdf5**
    

    where /path/to/image_directory should point to your image directory and your_dataset_name should be a name for the set of images you are adding. The script has parameters to control the batch size and run across multiple devices which can be used to tune the process. This will produce the hdf5 file vinvl.hdf5.

  3. Move the hdf5 file to file_paths.PRECOMPUTED_FEATURES_DIR under the vinvl directory, for example:

    mkdir -p data-cache/precomputed-features/vinvl
    mv features.hdf5 data-cache/precomputed-features/your_dataset_name/your_dataset_name.hdf5
    

Now the model will support image_ids with the format of your_dataset_name/path/to/image_file/in/your/directory. For example, if your directory contained the image val/dog/001.jpg and your dataset_name was "pets", the image_id "pets/val/001.jpg" will now be recognized by the model and load the pre-computed features for that image. Image ids of that format can now be passed torun_on_image_id.py or used in GPVExample objects with VinVL models.

Features for the web/coco/dce datasets can be re-computed using gpv2/build_image_features/precompute_dataset_features.py, but by default download_data will download them automatically.

gpv2's People

Contributors

chrisc36 avatar

Stargazers

Bo Chen avatar Xinran Wang avatar  avatar  avatar kingfly avatar Yuetian avatar Mingchen Zhuge avatar  avatar Zhao Zhang avatar YqGao716 avatar  avatar Sung-Min Lee avatar  avatar Yen-Cheng Liu avatar Theodore Galanos avatar Qingyun Wang avatar Weijian Xu avatar Aniket Agarwal avatar 爱可可-爱生活 avatar Tiancheng Zhao (Tony)  avatar Zhu Zhen avatar daoyuan98 avatar wincenzo avatar  avatar Rajiv Mehta avatar Leo Ota avatar Ryota Tanaka avatar Hakeem Demi avatar  avatar Adrian Johnston avatar Mike avatar Ryan Marten avatar

Watchers

James Cloos avatar Mike avatar  avatar jonathan m borchardt avatar Allen Institute for Artificial Intelligence avatar  avatar

gpv2's Issues

Issue regarding downloading the pre-trained models from AWS

When I run the following command:
aws s3 cp --recursive s3://ai2-prior-gpv/public/gpv2-models/gpv2 models/gpv2

I face the following error:
fatal error: Unable to locate credentials

So, I was wondering if someone could help me out on that.

Can I train gpv2 on Visual Grounding via loc?

Great work here!
I find that gpv2 can be well transferred to Visual Grounding as is shown in Supplementary Material and UnifiedIO. However, I encountered some performance related problems, when I use Localization to train gpv2 on RefCOCO. Specifically, I use the checkpoint trained with web data to init the model and use the default hyper parameters.
Could you please tell me how to train gpv2 on Visual Grounding?
Thanks in advance!

Confusion regarding to how classification accuracy is evaluated

From the description of GPV-2, it seems for classification task, the model output is simply the decoder output which could be arbitrary text. Then how is the classification performance evaluated? For example, "bee" and "bees" seem equally correct as an image label. Or perhaps I misunderstood the paper?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.