GithubHelp home page GithubHelp logo

peiming1998 / moe-vrd Goto Github PK

View Code? Open in Web Editor NEW

This project forked from shibshib/moe-vrd

0.0 0.0 0.0 526 KB

A new framework for creating large neural networks to deal with the problem of video visual relationship detection.

License: MIT License

Python 98.55% Cython 1.45%

moe-vrd's Introduction

Moe-VRD

This is the source code for the Moe-VRD Project as maintained by the VIP lab at the University of Waterloo. This code creates a mixture of experts framework and encapsulates the work done by Shang et al.'s VidVRD-II to a singular expert, which can then be used as an expert in the proposed mixture of experts for video relationship detection.

Note This work is in progress, and as this project is relatively new on Github code-wise there will be lots of changes.

Environment

The setup is very similar to Shang et al.'s code setup:

  1. Download ImageNet-VidVRD dataset and VidOR dataset. Then, place the data under the same parent folder as this repository.

  2. Install dependencies (tested with TITAN Xp GPU, Nvidia RTX A6000)

conda create -n moe-vrd -c conda-forge python=3.7 Cython tqdm scipy "h5py>=2.9=mpi*" ffmpeg=3.4 cudatoolkit=10.1 cudnn "pytorch>=1.7.0=cuda101*" "tensorflow>=2.0.0=gpu*"
conda activate moe-vrd
python setup.py build_ext --inplace

Quick Start

  1. Download the precomputed object tracklets and features for ImageNet-VidVRD (437MB) and VidOR (32GB: part1, part2, part3, part4), and extract them under imagenet-vidvrd-baseline-output and vidor-baseline-output as above, respectively.
  2. Run python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --train --cuda to train the model for ImageNet-VidVRD. Use --cfg config/vidor_3step_prop_wd1.json for VidOR.
  3. Run python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --detect --cuda to detect video relations (inference) and the results will be output to ../imagenet-vidvrd-baseline-output/models/3step_prop_wd0.01/video_relations.json.
  4. Run python evaluate.py imagenet-vidvrd test relation ../imagenet-vidvrd-baseline-output/models/3step_prop_wd0.01/video_relations.json to evaluate the results.
  5. To visualize the results, add the option --visualize to the above command (this will involve visualize.py so please make sure the environment is switched according to the last section). For the better visualization mentioned in the paper, change association_algorithm to graph in the configuration json, and then run Step 3 and 5.
  6. To automatically run the whole traininng and test pipepine multiple times, run python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --pipeline 5 --cuda --no_cache and then you can obtain a mean/std result.

Object Tracklet Extraction (optional)

  1. We extract frame-level object proposals using the off-the-shelf tool. Please first download and install tensorflow model library. Then, run python -m video_object_detection.tfmodel_image_detection [imagenet-vidvrd/vidor] [train/test/training/validation]. You can also download our precomputed results for ImageNet-VidVRD (6GB).
  2. To obtain object tracklets based on the frame-level proposals, run python -m video_object_detection.object_tracklet_proposal [imagenet-vidvrd/vidor] [train/test/training/validation].

Acknowledgement

This repository is built based on VidVRD-helper and VidVRD-II. If this repo is helpful in your research, you can use the following bibtex to both their paper and our repository:

@misc{sha2021moe,
    title={Video Relationship Detection using Mixture of Experts},
    author={Shaabana, Ala, Fieguth, Paul, Luo, Chong, Lan, Cuiling},
    journal={https://github.com/shibshib/moe-vrd.git},
    year={2021}
}

@inproceedings{shang2021video,
    author={Shang, Xindi and Li, Yicong and Xiao, Junbin and Ji, Wei and Chua, Tat-Seng},
    title={Video Visual Relation Detection via Iterative Inference},
    booktitle={ACM International Conference on Multimedia},
    year={2021}
}

moe-vrd's People

Contributors

shibshib avatar xdshang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.