GithubHelp home page GithubHelp logo

dvlab-research / pfenet Goto Github PK

View Code? Open in Web Editor NEW
297.0 9.0 54.0 1.18 MB

PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

Python 99.21% Shell 0.79%
few-shot segmentation pami-2020

pfenet's Introduction

PFENet

This is the implementation of our paper PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation that has been accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

Pinned

Our latest works are available at:

Hierarchical Dense Correlation Distillation for Few-Shot Segmentation (CVPR 2023): https://github.com/Pbihao/HDMNet.

Generalized Few-shot Semantic Segmentation (CVPR 2022): https://github.com/dvlab-research/GFS-Seg.

Get Started

Environment

  • torch==1.4.0 (torch version >= 1.0.1.post2 should be okay to run this repo)
  • numpy==1.18.4
  • tensorboardX==1.8
  • cv2==4.2.0

Datasets and Data Preparation

Please download the following datasets:

  • PASCAL-5i is based on the PASCAL VOC 2012 and SBD where the val images should be excluded from the list of training samples.

Images are available at: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

annotations: https://drive.google.com/file/d/1ikrDlsai5QSf2GiSUR3f8PZUzyTubcuF/view?usp=sharing

Note: If you wish to reproduce the results presented in the paper, please follow the provided datalist to use the specified data. As we followed Shaban's OSLSM work, we only utilized a subset of the complete dataset, which is not the entire 12,000 entries. However, different studies may have different usage requirements. To ensure fair comparison, we kindly request you to select the data according to your specific needs.

This code reads data from .txt files where each line contains the paths for image and the corresponding label respectively. Image and label paths are separated by a space. Example is as follows:

image_path_1 label_path_1
image_path_2 label_path_2
image_path_3 label_path_3
...
image_path_n label_path_n

Then update the train/val/test list paths in the config files.

[Update] We have uploaded the lists we use in our paper.

  • The train/val lists for COCO contain 82081 and 40137 images respectively. They are the default train/val splits of COCO.
  • The train/val lists for PASCAL5i contain 5953 and 1449 images respectively. The train list should be voc_sbd_merge_noduplicate.txt and the val list is the original val list of pascal voc (val.txt).
To get voc_sbd_merge_noduplicate.txt:
  • We first merge the original VOC (voc_original_train.txt) and SBD (sbd_data.txt) training data.
  • [Important] sbd_data.txt does not overlap with the PASCALVOC 2012 validation data.
  • The merged list (voc_sbd_merge.txt) is then processed by the script (duplicate_removal.py) to remove the duplicate images and labels.

Run Demo / Test with Pretrained Models

  • Please download the pretrained models.

  • We provide 8 pre-trained models: 4 ResNet-50 based models for PASCAL-5i and 4 VGG-16 based models for COCO.

  • Update the config file by specifying the target split and path (weights) for loading the checkpoint.

  • Execute mkdir initmodel at the root directory.

  • Download the ImageNet pretrained backbones and put them into the initmodel directory.

  • Then execute the command:

    sh test.sh {*dataset*} {*model_config*}

Example: Test PFENet with ResNet50 on the split 0 of PASCAL-5i:

sh test.sh pascal split0_resnet50

Train

Execute this command at the root directory:

sh train.sh {*dataset*} {*model_config*}

Related Repositories

This project is built upon a very early version of SemSeg: https://github.com/hszhao/semseg.

Other projects in few-shot segmentation:

Many thanks to their great work!

Citation

If you find this project useful, please consider citing:

@article{tian2020pfenet,
  title={Prior Guided Feature Enrichment Network for Few-Shot Segmentation},
  author={Tian, Zhuotao and Zhao, Hengshuang and Shu, Michelle and Yang, Zhicheng and Li, Ruiyu and Jia, Jiaya},
  journal={TPAMI},
  year={2020}
}

@InProceedings{peng2023hierarchical,
  title={Hierarchical Dense Correlation Distillation for Few-Shot Segmentation},
  author={Peng, Bohao and Tian, Zhuotao and Wu, Xiaoyang and Wang, Chenyao and Liu, Shu and Su, Jingyong and Jia, Jiaya},
  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

@InProceedings{tian2022gfsseg,
    title={Generalized Few-shot Semantic Segmentation},
    author={Zhuotao Tian and Xin Lai and Li Jiang and Shu Liu and Michelle Shu and Hengshuang Zhao and Jiaya Jia},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022}
}

pfenet's People

Contributors

guspan-tanadi avatar tianzhuotao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pfenet's Issues

About the crop size in COCO

Hi, I found in your paper the cropping size is 473 x 473 in training with a learning rate of 0.005. (COCO)
However, the cropping size in coco config is 641 x 641 with a learning rate of 0.02.
Just want to know what's the exact cropping size and learning rate in coco dataset, thank you!

VOC dataset question

Hello! Thanks for sharing your code. I noticed that your setting on the VOC dataset is different from the PANet (https://github.com/kaixin96/PANet). Their training data and test data are from the training set. Your training data are 15 classes from the training set and test data are 5 classes from the validation set. The numbers of images in each class are also different. May I ask why there are two different settings? Are the performances on two settings comparable? Thank you in advance.

COCO dataset

Hello, Thank you very much for your work. Can you tell me how to get the ground turth of the COCO data set?Can you share it with me?Thank you.

results obtained fluctuate greatly even set all random seeds

Hi, thanks for sharing your work!
There's a question that troubles me. I set the same random seed every time, but the results obtained fluctuate greatly (about 1%).
Is this due to the "Dropout " operation, or something I neglected?

Regards,
Lang

COCO2014 dataset label

Hi!

I notice that you used a *.png for COCO2014 label images. Can you provide one?

I'm trying to do it myself, and I find that when I convert from JSON of COCO's anotations to an image, one pixel corresponds to multiple category. How can I handle this situation to ensure that I get the same results as you?

Experiment detail

Hi, this project is very helpful. But I would like to know some details:

  1. the backbone you provided named resnet50_v2,pth and resnet101_v2.pth. Did you train this backbone from init for few-shot segmentation task?
  2. I get same result with pre-trained model. But the model training by myself is lower about 2% mIoU. Is the config same with the paper setting?

Thanks.

Questions about the paper

hello, I wanna try few-shot segmentation for medical images. There are two questions I could not understand. I hope you can help me.

  1. The ground truth MQ of the query image is invisible to the model. So how did you use cross entropy loss for training?
  2. In experiments part, what does 1-shot and 5-shot mean in Table 2? Could you please give me more implementation details?

May be a wrong code ?

In train.py the validate function.
image
but i find the len(subcls) == 5, len(subcls[0]) == 20, I set 5-shot, and batch_size_val = 20, split is 1. I can't get the purpose of ' 'subcls[0].cpu().numpy()[0]', I think the operator subcls[0].cpu().numpy()[0] is wrong,

Preprocessing

Hi, I'm reading your code and I have a question:

is there a particular reason for which you decided to implement transformations (in transform.py) by hand, instead of relying on the torch torchvision.transforms package?
Do you think it can be a source of delays in the code?

Thanks in advance!

Weighted_GAP

This function returns a number? Shouldn't be a tentor?

image

What does the "split" means

In README

Update the config file by speficifying the target split and path (weights) for loading the checkpoint.

what does the split means ?
What role does 'split' play in the program?
e.g

assert args.split in [0, 1, 2, 3, 999]

Thank you .

time

hello,Thanks for your work, By the way, when will open source FFENet++?

dataset composition

Sorry to bother again, the train_list you provide consist of 'voc_train' and 'sbd_train'. I wonder if 'sbd_val' should be included too(like 'trainaug' in Deeplab). I did not find the detailed description in OSLSM.
And i am still a little confused about the image number. 'sbd_train+'voc_train'=8284, why the number is above 5900 in your setting?
Not sure if i miss some preprocess.

question about resnet

Good work for FSS.

  1. I have a question about resnet-v2, you download it from your own path, if available, where you download it? or you pre-train it. Since resnet-v2 is different from resnet, have you noticed the effects of using the former, and the previous work like [canet] uses rennet.
  2. The support feature from layer3 multiply with the mask,
    ''supp_feat_4 = self.layer4(supp_feat_3*mask),
    final_supp_list.append(supp_feat_4)
    for i, tmp_supp_feat in enumerate(final_supp_list):
    tmp_supp_feat_4 = tmp_supp_feat * tmp_mask ''
    I notice mask operation twice, and there is only once in your paper,
    if I missed some details, which is appropriate.
    thanks!

Problem about reproduced accuracy

Hi, thanks for sharing the code! I have trained the model without any modification, but the results are always about 1% worse than the reported accuracy.
Here are some reproduced results with reported results on Pascalvoc dataset: Fold-0 60.8 (61.7), Fold-1 68.5 (69.5), Fold-2 53.9 (55.4)
So I wonder if I miss some tricks to reach the reported result? do I need to keep fine-tuning the model?

>some results

@Saralyliu
Hi,

Thanks for your attention.

The pre-trained weights of resnet-v2 are obtained from the official repo of PSPNet (https://github.com/hszhao/semseg). The difference between the original resnet only lies in the layer0 where the v2 version applies the deep-stem strategy. We used resnet-v2 to reproduce CANet and we got rather comparable results to the ones reported in the paper of CANet.

The mask used in "supp_feat_4 = self.layer4(supp_feat_3*mask)" is used for screening out the redundant background region, and I remember that it will not affect the performance much, you can try it out by sending feat_3 to layer-4 without the masking operation.

The another mask used in "tmp_supp_feat_4 = tmp_supp_feat * tmp_mask" is more important, since it is used for the prior calculation.

Thank you for your reply. If I understand correctly, resnet-v2 or resnet-50 is the same for feature extractor? Recently, we run voc group0 with your code, train is the numbers of 5955,val is 1449, and the best mIoU we test is 58.57 at 124 epoch without any modifications. we can't get your 61.7 mIoU in 1-shot case. waiting for your suggestion, thank you!

remove small objects

Thanks for sharing your work!
I noticed that you remove all objects whose area are smaller than 2x32x32, following the practice of OSLSM. However, in some recent works, such as PANet(ICCV'19) and CANet(CVPR'19), they include all objects in their source code whatever their size is. So I am wondering whether it is somewhat unfair to compare with these methods directly? Or are there any details that I have neglected?

n way setting

hello, thanks for your excellent work. I didn't find any parameter setting number of way, is the batch size of support image actually define the n_way? thanks

Problem about dataset setting and baseline

Thanks for sharing your work! I would like to know why the PASCAL training set only contains about 5900 images. I mean in the setting of CANet or OSLSM, the number of training images is more than that, right?

关于测试样本数量的问题

#6 中提到并给出了训练日志,在训练阶段使用count=2000来进行评估,但是代码中训练阶段是5000,这是否对评估结果有影响?

Hi, I noticed that the script of this link was modified on 2021/6/21. Now I use this modified script to generate the label image of coco, which is different from that generated before. Is there something wrong with me or the modified script? Can you provide a previous version that you used before? Thanks!

Hi, I noticed that the script of this link was modified on 2021/6/21. Now I use this modified script to generate the label image of coco, which is different from that generated before. Is there something wrong with me or the modified script? Can you provide a previous version that you used before? Thanks!

Originally posted by @JJ-res101 in #20 (comment)
Sorry, I have the same promblem, could you please provide with the origin version? Thank u a lot.

Question regarding evaluation with ignore_label

Hello, I have a question regarding evaluation

In line 52-64 in PFENet/util/util.py (function intersectionAndUnion), the function computes intersection & union between gt-mask and predicted mask. I found that the code uses ignore_label=255, to refine the prediction right before computing the intersection & union. Why did you adopt this kind of refinement process (using mask boundary, e.g., ignore_label)?

This refinement would make sense if the code tries to ignore the zero-padded regions (which is set to ignore_label, e.g., 255) in both predicted and gt masks. However, because the object boundary is also set as ignore_label=255, the prediction is further refined.

In the paper, the mIoU of PFENet in 1-shot setting (with resnet50 backbone) is 60.8% on PASCAL-5i dataset but when I reproduced the model without using object boundary (ignore label), the mIoU drops to 56.2%. May I ask why did you adopt such prediction refinement using gt boundary?

a

I brought example predictions: Top image visualizes naive prediction by the model while the bottom image visualizes refined prediction using gt-boundary.

PFENet++ code

Thanks for your work, By the way, when will you open FFENet++ source code?

train list and test list

Can you share the train_list.txt and val_list.txt file? I'd like to know the size and division of your training set and test set

Comparison of the number of model parameters

Hi,

Good work for FSS. I have a question about the count of the number of model parameters. The number of model parameters is compared with that of other methods in Table 1 of the paper.

It seems that only the number of trainable parameters (10.8M for the proposed modules) is counted. The number of fixed backbone parameters (23.6M) is not included.

But the 19.0M for CANet contains both the number of fixed backbone parameters and the number of trainable head parameters.

Is this a fair comparison? Or if I missed some details, please let me know.

Thanks!

Train / Val / Test split

Hi,

Thank you for the great work. I'm new to few-shot segmentation and I was just trying to get my head around how the data split is made. From the code, I seem to understand that the validation and test sets are the same, i.e the best model during training is picked based on its performance on the test set (with the novel classes). Am I missing something here ?

Thanks in advance,
Malik

about scale_lr

In the experiment setting of PASCAL-5i, the initial lr is setting to 0.0025. But i saw that there was a scaler_lr in the poly_learning_rate(). And the default value is 10. So the initial lr is actually 0.025. right?

question about train.py

Why subtract one here (subcls-1)?
I understand subcls here is the index of class_chosen in sublist ,so it should start from 0,but subtracting one cause it become a negative number
subcls = subcls[0].cpu().numpy()[0] class_intersection_meter[(subcls-1)%split_gap] += intersection[1] class_union_meter[(subcls-1)%split_gap] += union[1]

about coco-dataset

In the coco txt, I see the ".png" for annotations. How do you get the coco .png annotations? Thanks

Question about test episodes

Thank you for sharing! I'm confused about the set where test episodes are sampled. In this work, datasets are divided into training set and validation set. As far as I'm concerned, both train and val sets contain 4 folds. If you test on fold-0, training episode of fold-1,2,3 are sampled from training set, and, test episodes of fold-0 are sampled from val set? Do I have any misunderstandings? Thank you.

The gpu setting

I have changed the train_gpu: as 4, why the model still goes on the gpu 0. Could you give me some advice?

Performance inconsistency between paper and reproduce.

Thank you for your great work. I learned a lot from your paper.

I tested the pre-trained models you provided (ResNet50-based for Pascal VOC, 1 shot), but I can get better performance than you reported in your paper.

Method Split0 Split1 Split2 Split3 Mean
Reproduce 61.8 69.9 56.3 56.6 61.2
Paper 61.7 69.5 55.4 56.3 60.8

Is this performance fluctuation within the normal range? I used the same codes and settings in you github repo.

I also tried to train another baseline experiment (ResNet50-based for Pascal VOC, 5 shots) by myself using your configs.

Method Split0 Split1 Split2 Split3 Mean
Reproduce 64.7 71.5 55.5 60.6 63.1
Paper 63.1 70.7 55.8 57.9 61.9

There seems to be a greater fluctuation.

Colormap

Hello!

Thank you for this work. Could you provide the colormap you use to visualize segmentation results?

Thank you!

scripts for processing coco dataset

It is written that pair "image_path_1 label_path_1" is needed for training the network. But the annotation downloaded from coco is in .json format. May I ask if you can provide the script to generate semantic segmentation masks in .png from coco .json annotation?

Thanks

similarity matrix

For each xq ∈ XQ, we take the maximum similarity among
all support pixels as the correspondence value cq
By writing calculation process on the paper, I think It means that take maximum on each column
But in code ,I find that it is converse

similarity = similarity.max(1)[0].view(bsize, sp_sz*sp_sz)
I think it should be max(0)[0]
What's your opinion?
Thanks for your reply

model parameters in optimizer

Hello , Thank you for the excellent work. I noticed that when setting optimizer ,modules in the model are all separately listed while backbone layers' require_grad are already set False. Why not use model.parameters instead, will it influence the training procedure ?

how to generate SegmentationClassAug?

I have downloaded the PASCAL VOC 2012 dataset and SBD dataset from their official website respectively.But I donnot know how to use them in the code .Could you show me your dataset folder structure or the method to generate SegmentationClassAug?
Thanks , bubble from hfut!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.