GithubHelp home page GithubHelp logo

guanrunwei / mcn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from luogen1996/mcn

1.0 0.0 0.0 491 KB

[CVPR2020] Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation, CVPR2020 (oral)

Home Page: https://arxiv.org/abs/2003.08813

License: MIT License

Python 100.00%

mcn's Introduction

Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation

LICENSE Python PyTorch

《Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation》

by Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Chenglin Wu, Cheng Deng and Rongrong Ji.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, Oral

Updates

Introduction

This repository is keras implementation of MCN. The principle of MCN is a multimodal and multitask collaborative learning framework. In MCN, RES can help REC to achieve better language-vision alignment, while REC can help RES to better locate the referent. In addition, we address a key challenge in this multi-task setup, i.e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS). The network structure is illustrated as following:

Citation

@InProceedings{Luo_2020_CVPR,
author = {Luo, Gen and Zhou, Yiyi and Sun, Xiaoshuai and Cao, Liujuan and Wu, Chenglin and Deng, Cheng and Ji, Rongrong},
title = {Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Prerequisites

  • Python 3.6

  • tensorflow-1.9.0 for cuda 9 or tensorflow-1.14.0 for cuda10

  • keras-2.2.4

  • spacy (you should download the glove embeddings by running spacy download en_vectors_web_lg )

  • Others (progressbar2, opencv, etc. see requirement.txt)

Data preparation

  •  Follow the instructions of DATA_PRE_README.md to generate training data and testing data of RefCOCO, RefCOCO+ and RefCOCOg.

  •  Download the pretrained weights of backbone (vgg and darknet). We provide pretrained weights of keras version for this repo and another darknet version for facilitating the researches based on pytorch or other frameworks. All pretrained backbones are trained on COCO 2014 train+val set while removing the images appeared in the val+test sets of RefCOCO, RefCOCO+ and RefCOCOg (nearly 6500 images). Please follow the instructions of DATA_PRE_README.md to download them.

Training

  1. Preparing your settings. To train a model, you should modify ./config/config.json to adjust the settings you want. The default settings are used for RefCOCO, which are easy to achieve 80.0 and 62.0 accuracy for REC and RES respectively on the val set. We also provide example configs for reproducing our results on RefCOCO+ and RefCOCOg.
  2. Training the model. run train.py under the main folder to start training:
python train.py
  1. Testing the model. You should modify the setting json to check the model path evaluate_model and dataset evaluate_set using for evaluation. Then, you can run test.py by
python test.py

​ After finishing the evaluation, a result file will be generated in ./result folder.

  1. Training log. Logs are stored in ./log directory, which records the detailed training curve and accuracy per epoch. If you want to log the visualizations, please set log_images to 1 in config.json. By using tensorboard you can see the training details like below:

Notably, running this codes can achieve better performance than the results of our paper. (Nearly 1~4% improvements on each dataset.) This is because we have done many optimizations lately, such as carefully adjusting some training hyperparameters, optimizing the training codes and selecting a better checkpoint of pre-trained backbone, etc. In addition, it's fine that the losses do not decline when you use vgg16 as backbone. It may be a display problem and doesn't influence the performance.

Pre-trained Models and Logs

Following the steps of Data preparation and Training, you can reproduce and get better results in our paper. We provide the pre-trained models and training logs for RefCOCO, RefCOCO+, RefCOCOg and Referit.

  1. RefCOCO: Darknet (312M), vgg16(214M).
Detection/Segmentation (Darknet) Detection/Segmentation (vgg16)
val test A test B
80.61%/63.12% 83.38%/65.05% 75.51%/60.99%
val test A test B
79.68%/61.51% 81.49%/63.25% 75.30%/60.46%
  1. RefCOCO+: Darknet (312M), vgg16(214M).
Detection/Segmentation (Darknet) Detection/Segmentation (vgg16)
val test A test B
69.10%/53.00% 74.17%/57.00% 59.75%/46.96%
val test A test B
64.67%/49.04% 69.25%/51.94% 57.01%/44.31%
  1. RefCOCOg: Darknet (312M), vgg16(214M).
Detection/Segmentation (Darknet) Detection/Segmentation (vgg16)
val test
68.95% / 50.65% 67.88% / 50.62%
val test
63.50% / 47.81% 63.32% / 47.94%
  1. Referit: Darknet (312M), vgg16(214M).
Detection/Segmentation (Darknet) Detection/Segmentation (vgg16)
val test
69.29% / 57.00% 67.65% / 55.42%
val test
68.28% / 56.19% 65.49% / 53.68%

Acknowledgement

Thanks for a lot of codes from keras-yolo3 , keras-retinanet and the framework of darknet using for backbone pretraining.

mcn's People

Contributors

luogen1996 avatar

Stargazers

Hyunwoo Yu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.