GithubHelp home page GithubHelp logo

chunbolang / bam Goto Github PK

View Code? Open in Web Editor NEW
243.0 6.0 42.0 6.89 MB

Official PyTorch Implementation of Learning What Not to Segment: A New Perspective on Few-Shot Segmentation (CVPR'22 Oral & TPAMI'23).

License: MIT License

Python 98.77% Shell 1.23%
computer-vision few-shot-segmentation

bam's Introduction

Learning What Not to Segment: A New Perspective on Few-Shot Segmentation

This repo contains the code for our CVPR 2022 Oral paper "Learning What Not to Segment: A New Perspective on Few-Shot Segmentation" by Chunbo Lang, Gong Cheng, Binfei Tu, and Junwei Han.

Abstract: Recently few-shot segmentation (FSS) has been extensively developed. Most previous works strive to achieve generalization through the meta-learning framework derived from classification tasks; however, the trained models are biased towards the seen classes instead of being ideally class-agnostic, thus hindering the recognition of new concepts. This paper proposes a fresh and straightforward insight to alleviate the problem. Specifically, we apply an additional branch (base learner) to the conventional FSS model (meta learner) to explicitly identify the targets of base classes, i.e., the regions that do not need to be segmented. Then, the coarse results output by these two learners in parallel are adaptively integrated to yield precise segmentation prediction. Considering the sensitivity of meta learner, we further introduce an adjustment factor to estimate the scene differences between the input image pairs for facilitating the model ensemble forecasting. The substantial performance gains on PASCAL-5i and COCO-20i verify the effectiveness, and surprisingly, our versatile scheme sets a new state-of-the-art even with two plain learners. Moreover, in light of the unique nature of the proposed approach, we also extend it to a more realistic but challenging setting, i.e., generalized FSS, where the pixels of both base and novel classes are required to be determined.

✨ News

[April 5, 2023]

  • The extended version of this work is accepted to TPAMI 2023.

[Mar 2, 2022]

  • BAM is accepted to CVPR 2022.

[Mar 29, 2022]

  • Our paper is selected for an oral presentation.

[May 23, 2022]

  • We release all the trained models to facilitate validation.

[Jun 16, 2022]

  • The generated base annotations are available.

🔧 Usage

Dependencies

  • Python 3.8
  • PyTorch 1.7.0
  • cuda 11.0
  • torchvision 0.8.1
  • tensorboardX 2.14

Datasets

  • PASCAL-5i: VOC2012 + SBD

  • COCO-20i: COCO2014

    Download the data lists (.txt files) and put them into the BAM/lists directory.

  • Run util/get_mulway_base_data.py to generate base annotations for stage1, or directly use the trained weights.

Models

  • Download the pre-trained backbones from here and put them into the BAM/initmodel directory.
  • Download our trained base learners from OneDrive and put them under initmodel/PSPNet.
  • We provide all trained BAM models for performance evaluation. Backbone: VGG16 & ResNet50; Dataset: PASCAL-5i & COCO-20i; Setting: 1-shot & 5-shot.

Scripts

  • Change configuration via the .yaml files in BAM/config, then run the .sh scripts for training and testing.

  • Stage1 Pre-training

    Train the base learner within the standard learning paradigm.

    sh train_base.sh
    
  • Stage2 Meta-training

    Train the meta learner and ensemble module within the meta-learning paradigm.

    sh train.sh
    
  • Stage3 Meta-testing

    Test the proposed model under the standard few-shot setting.

    sh test.sh
    
  • Stage4 Generalized testing

    Test the proposed model under the generalized few-shot setting.

    sh test_GFSS.sh
    

Performance

Performance comparison with the state-of-the-art approaches (i.e., HSNet and PFENet) in terms of average mIoU across all folds.

  1. PASCAL-5i
    Backbone Method 1-shot 5-shot
    VGG16 HSNet 59.70 64.10
    BAM (ours) 64.41 (+4.71) 68.76 (+4.66)
    ResNet50 HSNet 64.00 69.50
    BAM (ours) 67.81 (+3.81) 70.91 (+1.41)
  2. COCO-20i
    Backbone Method 1-shot 5-shot
    VGG16 PFENet 36.30 40.40
    BAM (ours) 43.50 (+7.20) 49.34 (+8.94)
    ResNet50 HSNet 39.20 46.90
    BAM (ours) 46.23 (+7.03) 51.16 (+4.26)

Visualization

To-Do List

  • Support different backbones
  • Support various annotations for training/testing
  • Multi-GPU training
  • FSS-1000 dataset

References

This repo is mainly built based on PFENet, RePRI, and SemSeg. Thanks for their great work!

BibTeX

If you find our work and this repository useful. Please consider giving a star ⭐ and citation 📚.

@InProceedings{lang2022bam,
  title={Learning What Not to Segment: A New Perspective on Few-Shot Segmentation},
  author={Lang, Chunbo and Cheng, Gong and Tu, Binfei and Han, Junwei},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={8057--8067},
  year={2022},
  }  
  
@article{lang2023bam,
	title={Base and Meta: A New Perspective on Few-shot Segmentation},
	author={Lang, Chunbo and Cheng, Gong and Tu, Binfei and Li, Chao and Han, Junwei},
	journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)},
	volume={45},
	number={9},
	pages={10669-10686},
	year={2023},
}

bam's People

Contributors

chunbolang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

bam's Issues

Dataset Difference

Hello,

During the training of your model. I realized the model is using nearly 4000+ images for training on PASCAL fold0. While the other comparable models like HSNET/PPM are using different lists containing nearly 11000 images for fold 0. Why there is a difference between training images number. am I missing something?

预训练模型下载失败

你好啊!
感谢你杰出的工作!我在复现的过程中,遇到无法下载 base network 的预训练模型,可能是网络阻塞的原因,往往在中途下载中断,请问你可以提供百度云盘的链接吗?非常感谢 :)

请问作者可以提供完整的预训练模型吗?

首先,感谢作者们很棒的工作。我看到作者们只是提供了部分BAM的预训练模型。作者能不能将文章中的pascal、coco表格中发布的结果对应的预训练模型完整公开啊?这样很方便快速验证且复线作者的结果。

Performance of Base Learner

Hello @chunbolang,

You share model belonging to base learner without mIoU results on 16 classes for 4-fold. Would you share this evaluation result for base learner ? I want to check whether or not it results as same accuracy as my reproduced experiments in pretraining case. Thank you in advance.

Some questions about paper

Hi, Chunbo.
首先,非常感谢您与团队杰出的贡献。有几个关于论文的问题想请教一下,

  1. 为什么不加入Base-Learner也会有大幅的提升?仅仅是因为meta-learner使用了PSPNet从base class中预训练得到的encoder吗
  2. 在Ensemble module中您使用了两张特征图gram matrix之差的F范数当作调整因子,然后把meta bg与F范数map按照channel方向concat之后进行卷积。请问F范数map在这里起到什么作用,为什么该做法可以抑制base class的虚假激活区域?

Best wishes,
Yueyi Wang

The issue of code,please give some conduct.

Thank you for your great work. Recently,I read your code.l have a problem with config 'parser.add_argument('--viz', action='store_true', default=False) ', What's the function of this option ?

function unknow

    Hi,

thanks for your interest!

You can refer to the code below for visual analysis.

def plot_seg_result(img, mask, type=None, size=500, alpha=0.5, anns='mask'):
    assert type in ['red', 'blue', 'yellow']
    if type == 'red':
        color = (255, 50, 50)     # red  (255, 50, 50) (255, 90, 90) (252, 60, 60)
    elif type == 'blue':
        color = (90, 90, 218)   # blue (102, 140, 255) (90, 90, 218) (90, 154, 218)
    elif type == 'yellow':
        color = (255, 218, 90)  # yellow
    color_scribble = (255, 218, 90) # (255, 218, 90) (0, 0, 255)

    img_pre = img.copy()

    if anns == 'mask':
        for c in range(3):
            img_pre[:, :, c] = np.where(mask[:,:,0] == 1,
                                        img[:, :, c] * (1 - alpha) + alpha * color[c],
                                        img[:, :, c])            
    elif anns == 'scribble':
        mask[mask==255]=0
        mask = mask[:,:,0]
        dilated_size = 5
        Scribble_Expert = ScribblesRobot()
        scribble_mask = Scribble_Expert.generate_scribbles(mask)
        scribble_mask = ndimage.maximum_filter(scribble_mask, size=dilated_size) # 
        for c in range(3):
            img_pre[:, :, c] = np.where(scribble_mask == 1,
                                        color_scribble[c],
                                        img[:, :, c])                    
    elif anns == 'bbox':
        mask[mask==255]=0
        mask = mask[:,:,0]        
        bboxs = find_bbox(mask)
        for j in bboxs: 
            cv2.rectangle(img_pre, (j[0], j[1]), (j[0] + j[2], j[1] + j[3]), (255, 0, 0), 4) # -1->fill; 2->draw_rec

    img_pre = cv2.cvtColor(img_pre, cv2.COLOR_RGB2BGR)  
    
    if size is not None:
        img_pre = cv2.resize(img_pre, dsize=(size, size), interpolation=cv2.INTER_LINEAR)

    return img_pre

Originally posted by @chunbolang in #13 (comment)

Base Learner

I had an issue with implementation. But it is solved so I close the issue.

A little confusion about training BAM.

Very thanks for your great work. But I am a little confused about the details for training BAM. Have you chosen the model that achieves the best performance in the test set as the well-trained model? Does this cause a little unfairness in model evaluation?

--Thank you so much for your reply !!!

Question about Gram matrix

Hi Chunbo,

不好意思再次打扰,想请教几个关于ensemble module的问题

  1. 由于meta-learner接收support与query两种图片的输入,所以meta-learner容易受到support图片的干扰(敏感性),从而对部分区域产生错误预测。Ensemble module主要目的有两个:1. 通过调整因子抑制meta-learner预测图中错误激活的部分;2. 通过1x1卷积的方式融合base和meta learner的输出。请问对于Ensemble module的功能理解是否有误?
  2. 支持图像和查询图像差异越大,其Gram矩阵差值的F1范数越大,对应的调整因子应该越大,为什么调整因子会减小Meta Learner预测图像的权重呢?换句话说,当差异越大(调整因子越大)时Meta-learner的输出结果越不可靠,此时不是应该给予一个较小的调整因子来减小权重吗

Best wishes,
Yueyi Wang

关于训练过程中test_num的问题

您好,
您的代码中train.py中的Val部分的test_num设置为1000,而PFENet设置为5000
而您的代码中test.py中的test_num设置为5000(此时和PFENet一致)
请问您公布的实验结果是基于1000个episode的测试结果,还是基于5000个episode的测试结果,还是基于1000个episode进行val而后基于5000个episode进行测试的结果?

Custom dataset

Hello,
I am using your meta-learner for my custom dataset. My custom dataset has four classes each with 200 images. Each image contains more than one class. I am training on the first three classes and tested on the last class using the FSS training method. But the model is overfitted to the base classes and detecting base classes during testing on novel class.
How can I overcome overfitting on base classes?
How can I extend to multiclass segmentation so I can extract only the target class during testing?
Any other recommendation will be appreciated. Thank you

怎么样将代码设置成单卡运行呢

作者你好!我在用单卡3080Ti运行实验的时候遇见了,显存泄漏的问题,请问您有在单卡运行的经验么,或者说这个如果改的话需要改哪里呢?期待您的回复! 祝您生活愉快! 还有您的每个阶段的运行时间可以说一下么?

Tips for fine tuning the model (transfer learning)

Thank you for your great contribution with this model.
I would like to ask If transfer learning and/or fine-tuning is a sensible approach to acquire a well-functioning model.
You do provide pretrained weights for the standard datasets, however I would like to check how the model runs on different classes not present in both of those datasets.
What would be the valid approach to fine-tune this model - which layers to lock, and which models to train (for example the meta-learner only)?
Maybe fine-tuning using transfer learning is not valid for this model at all? Thank you for your answer.

Evaluation on another dataset

Hello,

Thanks for your great job!
I want to use the model trained on Pascal Dataset and evaluate on my own dataset. So how to do that? Change the dataset to
Pascal-format ?

Would you please give me some suggestions?

Best wishes to you!

Questions about K-shot setting

Thanks for your exciting job. I have some questions about the K-shot setting.

  1. In Sec 4.4 you said the smaller value of \psi indicates a greater contribution but the Eqn 17 seems like the bigger value of \psi indicates a greater contribution, which is obvious unexpected. An I miss something ?
  2. In the implementation of K-shot setting, the value of idx3 is just the [0,1,2,3,4] in 5-shot setting, which seems like useless. I really feel confuse about it.

Visualization

Hi
Congratulations on your Oral Presentation being accepted.
Do you have the codes to visualize the corresponding segmentation results?I want to see the segmentation effect of the training model。Do I have access to the codes?

Normal mIoU, but low FB-IoU

Hi, thanks for your great work! I have trained 4 models with ResNet-50 on Pascal datasets, the mIoU is exactly what you reported (0.678), but the FB-IoU is 0.796, about 1.5% lower than reported. Would you mind release all pretrained models? Thank you so much.

The val set for the base learner

Thanks for the excellent work. I read the code for the data set of base learner and find that the train/val categories setting seems same as the meta learner. Will it lead to information leakage when the base learner is trained ahead of the meta learner?

About the training dataset

Hello, I think your work is very interesting, but I observed that you re-screened the training samples of each subset, how did you filter the training samples.

Question on COCO experiments

Thank you for sharing your work!

We note that there are more than 13k pictures for every validation split in the coco dataset—however, only 1k iterations are set during testing, which may cause significant fluctuation of experimental results.

Compared with this, related works like HSnet, VAT, PFENet, and so on, all set more than 10k iterations during testing. So we replicated the experiments and followed all configs in your original with the sharing model parameters, except the number of the iterations set to 10k, in other words, the "test_num".

This is the result below:
result

Why is it set like this? Is there any other considerations?

How to do few-shot segmentation in 5-shot setting using BAM with my image files

Thank you very much for sharing your amazing work!

I am a complete beginner in computer vision.

I would like to do few-shot segmentation in 5-shot setting using BAM with my image data.
I have 5 jpg image files prepared by myself and their annotated png files as Support images and one Query image jpg file.
From here, I want to get the image data similar to the images shown in the BAM line of Visualization section in README.md of this repository.

However, I have no idea what code to write to achieve what I described above. Could you tell me what code to write?

Training dataset of the base learner

Hi, Lang.

It is an interesting work and I have some questions about the dataset of the base learner. In general, an FSS method uses the data split:

split0 split1 split2 split3
class1-5 class6-10 class11-15 class16-20
train_s0 train_s1 train_s2 train_s3
val_s0 val_s1 val_s2 val_s3

When we train on class 6-20 (train_s1+train_s2+train_s3) and test on class 1-5 (val_s0), we don't use those images in train_s0.

Although the images of different splits are partially overlapped, the images only containing class 1-5 in train_s0 should be excluded during meta-training class 6-20.

But it seems that those images solely owned by train_s0 are also used to train the base learner (with a full zero label), which may be a kind of data leakage. Do you have some ablation experiments about this point? For example, using only train_s1+train_s2+train_s3 to train the base learner.

I am curious about this point because too many images (of unseen classes) in training set may help the pre-trained model have some awareness of those unseen classes even if no positive labels are given (for example we can directly use these images to perform self-supervised learning), which excessively voilates the hypothesis of generalizing to unseen classes.

【实验结果】与之前方法的严重对比不公平 (Extremely unfair comparison with previous methods caused by different dataset processing methods)

谢谢作者的回复。请不要强行关闭没有完成的issuehttps://github.com/chunbolang/BAM/issues/44)

首先把novel类别设置为背景,并不会导致信息泄露,并且这一个操作是之前fewshot seg里头所有工作都沿用的。如果影响很小,我们认为完全没有必要加上这一个可能会带来不公平对比的有争议的操作。

此外,我们基于resnet50的主干网络,花了一些时间复现了在pascal上边的结果(把包含novel的图片保留于训练集,但是设置为背景)。结果如下:

1shot: 67.85; 71.20; 58.30; 61.39 平均结果仅为:64.7
5shot: 68.65; 73.47;66.79; 63.76 平均结果为: 68.16

可以发现,和之前所有论文的setting保持完全一致后,在第三个split上BAM的结果暴跌了十多个点,导致1-shot最后结果比论文中报的67.81低了3.1个点,这在小样本分割领域当中算是一个很大的性能差异了

从SVF最近更新的文章(https://arxiv.org/pdf/2206.06122.pdf)的Table1里头 能看出来,把这个trick(在训练时丢掉包含novel类别的图片)用在pfenet以及BAM上边,均能带来三个点的提升(SVF复现的BAM结果为64.59)。与我们这次实验的结论接近:这个trick能带来较大性能提升。

如果按照当时22年投稿的情况来看,1-shot结果和在21年早期的sota HSNET是接近的,5shot结果甚至不如hsnet(69.5),如果没有这个trick带来的接近3个点性能提升,在当下这么卷比拼性能的情况下,别说中oral了,可能中稿都存在一定变数。

--------------------------以下是上一个issue的内容(https://github.com/chunbolang/BAM/issues/44)
谢谢作者开源了代码。
但是在复现过程中,发现了和之前工作不一样的dataset操作,即把训练集当中包含novel class的图片去掉了。 但是在之前的工作(如HSNet、PFENet)当中,训练集当中包含novel class的图片中的novel class是被设置为背景的。
BAM这样的操作理所当然 会使得 在测试时对novel class的检出能力更强,因为训练没有将这些novel class设置为背景,但是和之前方法不一样的操作导致对比严重不公平。

在最近的一篇被nips 22接受的文章(Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning)里头的table1、2中指出:使用BAM的策略(images from training set containing the novel class on test set were removed),即使是几年前的PFENet都能取得和BAM 接近甚至一些情况下比BAM更优的结果。 甚至可以推测,如果把novel class当作背景来训练,可能BAM的方法本身的效果就大大降低,拉不开与当时同期工作的性能差距。

由于审稿人审稿过程不强制代码开源无法知道这个操作,这个可能对后续工作会有严重的误导作用,进而沿用这一导致不公平对比的操作。
请问作者对这个不公平的对比有何解释吗?

Use of adjustment factor

In paper, there are no clear explanation of how adjustment factor is utilized to affect meta prediction so I inspect your code to make sense of how you include it in your model. I realized that you concatenate frobenius norm of difference between gram matricies and meta prediction of foreground or background and process this map with shared 1 by 1 convolution. Is there any reason of initializing with specifically with 1 0 ? @chunbolang

Problem about the implementation of Frobenius norms

Hi, Lang!
The $\fi$ after Frobenius norms in the paper looks like just a number. But, in code, the weights of self.cls_merge seems adaptive and the dimension is torch.Size([1, 2, 1, 1]) with two parameters?
Maybe i do not understand the real meaning in the code, can you explain?
Sorry to bother you and thank you soooo much!

questions about the baseline

Hello, I have some questions about the baseline. Does the baseline only use the meta learner? Some details about the baseline are not very clear, I hope you can give me some pointers. Thanks

cls_type='Base' or 'Novel'?

Thanks for your work.
When you create the model, you use code like this, model = eval(args.arch).OneModel(args, cls_type='Base').
I want to know when to use 'Base' and when to use 'Novel'.

Training Log

Hi, thanks for your solid work! By the way, would you please provide the log files of all experiments?

Training BAM

Hello,

I am training BAM using resnet50 on pascal 0 fold. But Miou is still zero after 7 epochs.

image

学习率

您好!感谢您的卓越工作!有一个问题请教您。在第二个阶段,加载您提供的训练好的base learner进行训练时,按照您论文中写的bs设为8,lr设为0.05训练,损失会变为NAN,请问这会是什么原因呢?
期待您的回复,谢谢!

Config file for training

I notice that "warmup" parameter is set to False, I also read PFENet paper, it is policy like DeepLab model but I don't understand why "warmup" parameter is False

Paper reproduction

Hello, thank you for your excellent work. I'm reproducing the results. First, I ran get_ mulway_ base_ data. Py file, the corresponding train and val data are obtained: 0, 1, 2, 3.
I finished running train_ base. SH file to get Figure 1
11
Then run train SH, but 75 epoch is still 0, as shown in Figure 2
22

What is the reason

关于coco的fss_list文件中train下data_list_0文件

你好!

很感谢你们启发性的工作!

我在复现的过程中发现coco的fss_list文件中train下data_list_0文件中图片和标注都要比其他其他list少一半,最后得到的coco上split0 1shot的mean iou结果只有37.8(比原论文中结果少了5.51),而其他的split的结果都比较正常。请问这是什么原因呢?

谢谢!

关于data_list和sub_class_file_list

您好!非常感谢你们的工作,很有启发!在复现你们代码的时候我有两个问题想向您请教一下

  1. 对于PASCAL-5i数据集包括VOC2012和SBD,VOC2012数据集已经下载,而SBD这部分对应的网站我们下载了benchmark.tgz,但没有看懂要怎么处理,我们在另一个网站下载了SegmentationClassAug.zip,然后可以运行。这个是SBD部分正确的处理方式吗?

  2. 您提供的fss_list中的data_list和sub_class_file_list分别是怎么来的呢,有对应处理这部分的代码吗?

谢谢!盼回复!

Question about the pre-training?

Hello, thanks for your great work!
But I have some question about the pre-training: Does the novel class is seen as the background class when training the base-learner, which means that it is possible to train a black mask image in specific iteration?

COCO dataset annotations/json->png

您好!感谢您的工作!
COCO官网下载下来的label部分是annotations,可以提供annotations中json转png的代码吗?

非常感谢!

Results are able to reproduce

Very nice job !!! I am very in interested in your work. But It seems that I cannot reproduce your results, just as presented below
image
what is wrong I have done?

--Thank you so much for your help!!! ^_^

question about eval

when batch_size_val sets more than 1, bugs will occur like following:
Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/mnt/cache/xuguozheng/.conda/envs/torch1.8/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/mnt/cache/xuguozheng/.conda/envs/torch1.8/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/mnt/cache/xuguozheng/.conda/envs/torch1.8/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 83, in default_collate
return [default_collate(samples) for samples in transposed]
File "/mnt/cache/xuguozheng/.conda/envs/torch1.8/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 83, in
return [default_collate(samples) for samples in transposed]
File "/mnt/cache/xuguozheng/.conda/envs/torch1.8/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 63, in default_collate
return default_collate([torch.as_tensor(b) for b in batch])
File "/mnt/cache/xuguozheng/.conda/envs/torch1.8/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [366, 500] at entry 0 and [333, 500] at entry 1

Dose the batch_size_val have to be 1(Every thing goes well when set to 1)? Or Is there anything else I shound pay attention to?

Tensorboard

Hello,
Congratulations on the acceptance of your paper as an Oral Presentation.
I just want to track training using a tensorboard. Although you are using SummaryWriter for the tensor board, can you elaborate how can I track the training on the tensorboard.

coco测试指标问题

您好!
我想关于coco指标问题想向您请教一个问题,我加载了您提供的5-shot coco预训练权重,在test_num为1000的时候,我得不到您论文所给的结果,基本下降了3个百分点,在测试文件我也固定了随机因子,我将yaml文件的shot改为了5。我想问一下你论文中的结果是test_num为1000的结果还是多少?

base_annotation

Very thanks for your great work.There is an operation like this in your work:“Run util/get_mulway_base_data.py to generate base annotations for stage1, or directly use the trained weights.”.But i run this code failed.So can you share the base_annotation?

--Thank you so much for your reply !!!

Reported baseline result

Thank you for the interesting paper @chunbolang . Specifically showing a simple meta learner with the help of a supervised network can produce state-of-the-art results.

My first question is the reported result for baseline in the table1.

Is the baseline just a meta learner network (without base learner and ensemble module) on top of a frozen backbone (Pretrained by weights of a segmentation network trained on base classes)? and it is only trained by loss_meta on base classes?

image

As you know the same meta leaner in the previous works on top of frozen backbone pretrained by the Imagenet can produce for example ~%55 MIOU on 1 shot setting of Pascal. In your case, it is reported 65.4 for baseline. I want to know this big boost was achieved because of using frozen backbone Pretrained by weights of a segmentation network trained on base classes?

与之前方法的严重对比不公平 (Extremely unfair comparison with previous methods caused by different dataset processing methods)

谢谢作者开源了代码。
但是在复现过程中,发现了和之前工作不一样的dataset操作,即把训练集当中包含novel class的图片去掉了。 但是在之前的工作(如HSNet、PFENet)当中,训练集当中包含novel class的图片中的novel class是被设置为背景的。
BAM这样的操作理所当然 会使得 在测试时对novel class的检出能力更强,因为训练没有将这些novel class设置为背景,但是和之前方法不一样的操作导致对比严重不公平。

在最近的一篇被nips 22接受的文章(Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning)里头的table1、2中指出:使用BAM的策略(images from training set containing the novel class on test set were removed),即使是几年前的PFENet都能取得和BAM 接近甚至一些情况下比BAM更优的结果。 甚至可以推测,如果把novel class当作背景来训练,可能BAM的方法本身的效果就大大降低,拉不开与当时同期工作的性能差距。

由于审稿人审稿过程不强制代码开源无法知道这个操作,这个可能对后续工作会有严重的误导作用,进而沿用这一导致不公平对比的操作。
请问作者对这个不公平的对比有何解释吗?

有关测试指标

您好!不好意思又打扰您!

  1. 我们在VOC数据集上从头训练模型后推理,发现有1-2个点左右的差异,想问下您是用几张GPU跑的呢?

  2. 我们使用您提供的模型权重进行推理,在VOC数据集的四个split上(resnet50/shot=1)得到68.77/73.48/67.18/60.27的指标,和原论文中的68.97/73.59/67.55/61.13平均后有0.4左右的差异。

想请教您这种差异是什么原因导致的呢?(因为固定了随机种子,我们使用两台不同机器跑结果是一致的。)
非常感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.