GithubHelp home page GithubHelp logo

gsfm's Introduction

Global Spectral Filter Memory Network for Video Object Segmentation

ECCV 2022

Abstract

This paper studies semi-supervised video object segmentation through boosting intra-frame interaction. Recent memory network-based methods focus on exploiting inter-frame temporal reference while paying little attention to intra-frame spatial dependency. Specifically, these segmentation model tends to be susceptible to interference from unrelated nontarget objects in a certain frame. To this end, we propose Global Spectral Filter Memory network (GSFM), which improves intra-frame interaction through learning long-term spatial dependencies in the spectral domain. The key components of GSFM is 2D (inverse) discrete Fourier transform for spatial information mixing. Besides, we empirically find low frequency feature should be enhanced in encoder (backbone) while high frequency for decoder (segmentation head). We attribute this to semantic information extracting role for encoder and fine-grained details highlighting role for decoder. Thus, Low (High)-Frequency Module is proposed to fit this circumstance.

Framework

Demo

Results (Trained Without COCO or BL30K)

Dataset Split J&F J F
DAVIS 2016 val 91.4 90.1 92.7
DAVIS 2017 val 86.2 83.1 89.3
DAVIS 2017 test-dev 77.5 74.0 80.9
Dataset Split Overall Score J-Seen F-Seen J-Unseen F-Unseen
YouTubeVOS 18 validation 83.8 82.8 87.5 78.3 86.5

Requirements

The following packages are used in this project.

For installing Pytorch and torchvision, please refer to the official guideline.

For others, you can install them by pip install -r requirements.txt.

Data Preparation

Please refer to STCN to prepare the datasets and put all datasets in /data. Note that in our project we only use the static datasets, DAVIS, and YouTubeVOS. (BL30K is not used)

Code Structure

├── data/: here are train and test datasets.
│   ├── static
│   ├── DAVIS
│   ├── YouTube
│   ├── YouTube2018
├── datasets/: transform and dataloader for train and test datasets
├── model/: here are the code of the network and training engine(model.py)
├── saves/: here are the checkpoint obtained from training
├── scripts/: some function used to process dataset
├── util/: here are the config(hyper_para.py) and some utils
├── inference_memory_bank.py: the memory bank used in test
├── train.py
├── inference_core.py: test engine for DAVIS
├── inference_core_yv.py: test engine for YouTubeVOS
├── eval_*.py
├── requirements.txt

Training

For pretraining:

To train on the static image datasets, use the following command:

CUDA_VISIBLE_DEVICES=[GPU_ids] OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port [cccc] --nproc_per_node=GPU_num train.py --id [save_name] --stage 0

For example, if we use 8 GPU for training and use 's0-GSFM' as ckpt name, the command is:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 12345 --nproc_per_node=8 train.py --id s0-GSFM --stage 0

For main training:

To train on DAVIS and YouTube, use this command:

CUDA_VISIBLE_DEVICES=[GPU_ids] OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port [cccc] --nproc_per_node=GPU_num train.py --id [save_name] --stage 3 --load_network path_to_pretrained_ckpt

Samely, if using 8 GPU, the command is:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 12345 --nproc_per_node=8 train.py --id s03-GSFM --stage 3 --load_network saves/s0-GSFM/**.pth

Resume training

Besides, if you want to resume interrupted training, you can run the command with --load_model and using the *_checkpoint.pth, for example:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 12345 --nproc_per_node=8 train.py --id s03-GSFM --stage 0 --load_model saves/s0-GSFM/s0-GSFM_checkpoint.pth

Inference

Run the following file to perform inference on the corresponding dataset.

  • eval_davis_2016.py used for DAVIS 2016 val set.
  • eval_davis.py used for DAVIS 2017 val and test-dev set (controlled by --split).
  • eval_youtube.py used for YouTubeVOS 2018/19 val and test set.

Evaluation

For the evaluation metric on DAVIS 2016/2017 val set, we refer to the repository DAVIS_val. For DAVIS 2017 test-dev set, you can get the metric results by submitting masks to the Codalab website DAVIS_test For YouTube2019 val set, please submit your results to YouTube19 For YouTube2018 val set, please submit to YouTube18

Acknowledgement

Code in this repository is built upon several public repositories. Thanks to STCN, MiVOS, FFC, BMaskR-CNN for sharing their code.

gsfm's People

Contributors

yongliu20 avatar

Stargazers

4sin30° avatar Donghan avatar Jixuan Fan avatar lg(x) avatar Nick Imanzi avatar He Li avatar Zong-Liang avatar Karchen avatar Rex Cheng avatar  avatar Udon avatar  avatar  avatar Pengxiang Li avatar Mingsong Li avatar Mingqi Gao avatar  avatar  avatar Shallow avatar longma avatar Zhengkai Jiang avatar Eteph avatar 爱可可-爱生活 avatar An-zhi WANG avatar Qingyan Bai avatar Xiaohao Xu avatar Guo Pinxue avatar  avatar  avatar  avatar Roger GOU avatar Jiahao Wang avatar Mingdeng avatar  avatar FeiiYin avatar  avatar

Watchers

 avatar longma avatar  avatar

Forkers

cv-seg sjtuwangjy

gsfm's Issues

训练过久Loss变为Nan

你好,我非常欣赏你的工作,但是当我对你的工作进行复现实验的时候出现了以下问题,希望你能帮我解决。
参数配置:
RTX3090;
torch 1.11.0;
stage3参数改动(因为是单卡,所以将iterations延长)
image

在加载模型时由于torch版本问题
torch._C._LinAlgError: cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling cusolverDnXgeqrf( handle, params, m, n, CUDA_R_32F, reinterpret_cast<void*>(A), lda, CUDA_R_32F, reinterpret_cast<void*>(tau), CUDA_R_32F, reinterpret_cast<void*>(bufferOnDevice), workspaceInBytesOnDevice, reinterpret_cast<void*>(bufferOnHost), workspaceInBytesOnHost, info). This error may appear if the input matrix contains NaN.
对代码进行改动为
image
将矩阵的正交初始化移到cpu上计算然后上传到gpu上,其他没有改动。

stage3训练时刚开始的loss下降正常,但是iteration迭代次数过多时所有的Loss均将为Nan,并且将10000iteration的权重进行推理时,且没有分割结果。

About GSFM .pth or results

Hi ! Thank you for your excellent work. It‘s really helpful for my research work.
Could you share the model parameter file(.pth) or the segment results of GSFM?
I would like to perform visual comparisons with GSFM in my paper.
Please accept my apologies if there was any inconvenience. ^ ^
Best Wishes!

How to make such an Illustration shown in Fig 1 of your paper?

Hi there,
I am reading your paper recently, the core-designed module LFM really sheds some light on this field!
And I am also interested in your analysis of the problem, especially the way you compare your GSFM with other SOTA, as shown in figure 1.
Could you teach me how to make such an Illustration? How to find the matched pixels according to the query pixel?
Best Regard!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.