GithubHelp home page GithubHelp logo

silky1708 / locate Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 31.03 MB

[BMVC 2023] Official repository for LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and Bootstrapped Self-training

Home Page: https://arxiv.org/abs/2308.11239

Python 98.77% Shell 0.55% Jupyter Notebook 0.69%
bmvc2023 object-discovery segmentation self-supervised-learning

locate's Introduction

LOCATE

[BMVC 2023] Official repository for "LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and Bootstrapped Self-training"
Silky Singh, Shripad Deshmukh, Mausoom Sarkar, Balaji Krishnamurthy.

project page | arXiv | bibtex

qual results

Our self-supervised framework LOCATE trained on video datasets can perform object segmentation on standalone images.

Installation

Create a conda environment

conda create -n locate python=3.8
conda activate locate

The code has been tested with python=3.8, pytorch=1.12.1, torchvision=0.13.1 with cudatoolkit=11.3 on Nvidia A100 machine.

Use the official Pytorch installation instructions provided here. Other dependencies can be installed following the guess-what-moves repository. It is mentioned below for completeness.

conda install -y pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
conda install -y kornia jupyter tensorboard timm einops scikit-learn scikit-image openexr-python tqdm gcc_linux-64=11 gxx_linux-64=11 fontconfig -c conda-forge
pip install cvbase opencv-python wandb 
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Datasets

We have tested our method on video object segmentation datasets (DAVIS 2016, FBMS59, SegTrackv2), image saliency detection (DUTS, ECSSD, OMRON) and object segmentation (CUB, Flowers-102) benchmarks.

Training

Step 1. Graph Cut

We utilise the MaskCut algorithm from the CutLER's repository [link] with N=1 to get the segmentation mask for the salient object in all the video frames independently. We modify the pipeline to take in optical flow features of the video frame, and combine both image and flow feature similarities in a linear combination to produce edge weights. The modified code can be found in the CutLER directory.

We perform a single round of post-processing using Conditional Random Fields (CRF) to get pixel-level segmentation masks. The graphcut masks for all the datasets are released here. We use ARFlow trained on the synthetic Sintel dataset to compute the optical flow between video frames.

Step 2. Bootstrapped Self-training

Using segmentation masks from previous step as pseudo-ground-truth, we train a segmentation network. In the root directory, run train.sh.

Inference

Use the test script for running inference: python test.py

Model Checkpoints

Dataset Checkpoint path
DAVIS16 locate_checkpoints/davis2016.pth
SegTrackv2 locate_checkpoints/segtrackv2.pth
FBMS59 (graph-cut masks) locate_checkpoints/fbms59_graphcut.pth
FBMS59 (zero-shot) locate_checkpoints/fbms59_zero_shot.pth
DAVIS16+STv2+FBMS locate_checkpoints/combined.pth

The checkpoints are released here. The combined.pth checkpoint refers to the model trained on all the video datasets (DAVIS16, SegTrackv2, FBMS59) combined.

Acknowledgments

This repository is heavily based on guess-what-moves, CutLER. We thank all the respective authors for open-sourcing their amazing work!

Citation

If you find this work useful, please consider citing:

@inproceedings{Singh_2023_BMVC,
author    = {Silky Singh and Shripad V Deshmukh and Mausoom Sarkar and Balaji Krishnamurthy},
title     = {LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and Bootstrapped Self-training},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year      = {2023},
url       = {https://papers.bmvc2023.org/0295.pdf}
}
@article{singh2023locate,
  title={LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and Bootstrapped Self-training},
  author={Singh, Silky and Deshmukh, Shripad and Sarkar, Mausoom and Krishnamurthy, Balaji},
  journal={arXiv preprint arXiv:2308.11239},
  year={2023}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.