GithubHelp home page GithubHelp logo

clovaai / wsolevaluation Goto Github PK

View Code? Open in Web Editor NEW
329.0 15.0 55.0 12.12 MB

Evaluating Weakly Supervised Object Localization Methods Right (CVPR 2020)

License: MIT License

Python 98.05% Shell 1.95%
wsol-methods wsol-task wsol-evaluation wsol-training dataset-contribution evaluation-protocol wsol-benchmark cvpr2020 localization

wsolevaluation's Issues

openimages pxap perf changed between maxboxacc and maxboxaccv2. why?

hi,
in https://arxiv.org/pdf/2001.07437.pdf , the pxap performance changed between using maxboxacc (tab.2) and using maxboxaccv2 (tab.8) over openimages. i didnt expect that.
why is that?
because the validation (for model selection) on openimages is done using pxap (as for test as well), using maxboxacc or maxboxaccv2 in the code configuration should not impact the results on pxap, right?

or did you run a new 30 random trials for hyper-parameters search that led to different best hyper-parameters for pxap tab.8? or simply due to randomness if the code is not reproducible (running the same experiment twice with the same settings does not lead to the exact same results...)?

tab 2

tab 8

thanks

num_val_sample_per_class

Hi, I wanted to ask about the argument num_val_sample_per_class in
python main.py --dataset_name OpenImages \ --architecture vgg16 \ --wsol_method cam \ --experiment_name OpenImages_vgg16_CAM \ --pretrained TRUE \ --num_val_sample_per_class 5 \ --large_feature_map FALSE \ --batch_size 32 \ --epochs 10 \ --lr 0.00227913316 \ --lr_decay_frequency 3 \ --weight_decay 5.00E-04 \ --override_cache FALSE \ --workers 4 \ --box_v2_metric True \ --iou_threshold_list 30 50 70 \ --eval_checkpoint_type last

You set it to 5 for OpenImage dataset. It shouldn't be 25 instead because we have 25 sample per class for the validation set ?

So when we will use your code with CUB and ImageNet datasets to which number should we set the argument num_val_sample_per_class ?

Thank you in advance for you replay :)

logic problem

When generate cam, you use the ground truth label to get the channel weight of fully connected layer. But I think the predict class of model is the right choice.

ImagenetV2 has a lot of incorrect Bbox annotation information?

I used the bBox annotation information provided by you for testing, and found many wrong Bboxes.
for example:
100/5.jpeg,73,214,148,299
100/5.jpeg,180,85,200,111
100/5.jpeg,169,137,231,195
100/5.jpeg,394,163,462,207
100/5.jpeg,316,134,374,160
100/5.jpeg,136,206,171,235
100/5.jpeg,242,146,267,164

Pretrained Models

Hi.
Thank you for such a clean repository. I would like to ask if it's possible to have access to a few of the pre-trained models. As many would use your repository to reproduce the current SOTA, having access to pre-trained models would indeed speed up the process of testing code validity. I think just having access to a pretrained model on CUB would make a huge difference because everyone can basically download it and validate the accuracies on a local machine (even without GPU).

Many thanks.

Doubt regarding input image files in metatdata

The code in the data loader reads the image_ids file as: which has.jpg appended

with open(metadata['image_ids' + suffix]) as f:
for line in f.readlines():
image_ids.append(line.strip('\n'))

but while reading open Images data, and reading the localization.txt, it expects. path_file.jpg.npy. isnt it wrong? should it not be path_file.npy. The jpg should be dropped. It would be great if you could clear this issue.

Evaluation_test Failure and also path cannot find.

Hi,

Thanks for this great work done. I have two questions:
(1) When I run: python evaluation_test.py, it outputs:
FAIL: test_compute_bboxes_from_scoremaps_degenerate (main.EvalUtilTest)

Traceback (most recent call last):
File "evaluation_test.py", line 98, in test_compute_bboxes_from_scoremaps_degenerate
self.assertListEqual(boxes, [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0],
AssertionError: First sequence is not a list: ([array([[0, 0, 0, 0]]), array([[0, 0, 0, 0]]), array([[0, 0, 0, 0]]), array([[0, 0, 0, 0]]), array([[0, 0, 0, 0]])], [1, 1, 1, 1, 1])

======================================================================
FAIL: test_compute_bboxes_from_scoremaps_multimodal (main.EvalUtilTest)

Traceback (most recent call last):
File "evaluation_test.py", line 125, in test_compute_bboxes_from_scoremaps_multimodal
self.assertListEqual(boxes, [[0, 0, 4, 3],
AssertionError: First sequence is not a list: ([array([[0, 0, 4, 3]]), array([[0, 0, 2, 2]]), array([[0, 3, 3, 3]]), array([[2, 3, 3, 3]]), array([[0, 3, 1, 3]])], [1, 1, 1, 1, 1])

======================================================================
FAIL: test_compute_bboxes_from_scoremaps_unimodal (main.EvalUtilTest)

Traceback (most recent call last):
File "evaluation_test.py", line 110, in test_compute_bboxes_from_scoremaps_unimodal
self.assertListEqual(boxes, [[1, 1, 4, 3],
AssertionError: First sequence is not a list: ([array([[1, 1, 4, 3]]), array([[1, 1, 4, 3]]), array([[2, 1, 4, 3]]), array([[2, 2, 4, 3]]), array([[2, 2, 3, 3]])], [1, 1, 1, 1, 1])

(2) My second problem is when I run your suggested script: python evaluation.py --scoremap_root=train_log/scoremaps/ --metadata_root=metadata/ --mask_root=dataset/ --dataset_name=CUB --split=val --cam_curve_interval=0.01

It gives the following error:

Loading and evaluating cams.
Traceback (most recent call last):
File "evaluation.py", line 528, in
main()
File "evaluation.py", line 516, in main
evaluate_wsol(scoremap_root=args.scoremap_root,
File "evaluation.py", line 465, in evaluate_wsol
image_ids = get_image_ids(metadata)
File "/egundogdu/WSOL/wsolevaluation/data_loaders.py", line 62, in get_image_ids
with open(metadata['image_ids' + suffix]) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'metadata/image_ids.txt'

Do you have any idea with these issues?

Cropping and Resizing

Can you please provide any insights on why you crop images in 224x224 patches by default and why you resize them to 256x256, regardless of input image size?

Results of OpenImage30K

Hi! Thank you for your work in bringing new benchmarks and a unified approach to evaluation for the WSOL community.
In the CVPR2020 paper, some tables you provided seem to be inconsistent with those in https://docs.google.com/spreadsheets/d/1O4gu69FOOooPoTTtAEmFdfjs2K0EtFneYWQFk8rNqzw/edit#gid=0. For example, the result of Inceptionv3 with ACoL in the paper is 63.0, but the form indicated by the web link page is lower. Is there something wrong with the way I use it? Looking forward to your reply!

Some issue about the CUB val dataset

Thanks for your work.

When I follow the guidance to run the code on CUB, the error says:

sampled_indices = np.random.choice( indices, self.num_sample_per_class, replace=False)
........
........
File "mtrand.pyx", line 946, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'

Looking back to the config.py, the recommended setting is 'args.num_val_sample_per_class<=5' for CUB.

def check_dependency(args):

if args.dataset_name == 'CUB':

if args.num_val_sample_per_class >= 6:

raise ValueError("num-val-sample must be <= 5 for CUB.")

if args.dataset_name == 'OpenImages':

if args.num_val_sample_per_class >= 26:

raise ValueError("num-val-sample must be <= 25 for OpenImages.")

However, the bird category '059.California_Gull' only contains 3 images for validation while '002.Laysan_Albatross' and '007.Parakeet_Auklet' contains 6 images each. So this leads to the error above.

The solution is to expand the 059.California_Gull number from 3 to 5, or change the recommended num_val_sample_per_class=3. Feel strange there is no one meet with this problem before. 👍

ValueError: Cannot take a larger sample than population when 'replace=False'

Getting the mentioned error by running this command

!python main.py --dataset_name CUB
--architecture vgg16
--wsol_method cam
--experiment_name CUB_vgg16_CAM
--pretrained TRUE
--num_val_sample_per_class 5
--large_feature_map FALSE
--batch_size 32
--epochs 10
--lr 0.00227913316
--lr_decay_frequency 3
--weight_decay 5.00E-04
--override_cache FALSE
--workers 4
--multi_iou_eval False
--iou_threshold_list 30 50 70
--multi_contour_eval False
--eval_checkpoint_type best

Custom datasets

Cloud you throw some light on how to modify the code for other custom datasets?
I want to use my own dataset instead of using ImageNet or CUB or OpenImages.
I have dataset in VOC format

Request for configs and question about the MaxBoxAcc

Hi,
Thank you for sharing the code for the awesome paper!

I was trying to reproduce the results shown in Table 6 of the appendix but have had a hard time to reach the performance you reported. Would it possible for you to share the configs for CAM, HaS, ACoL and and SPG for V, I and R?

Also, I was wondering if there is a specific reason why you proposed to use the max of GT-known instead of the max of Top1-Loc. According to my understanding, Top1-Loc is more comprehensive metric since it also takes classification performance into account. Although localization is important, if a prediction on classification is wrong in the first place, a model would not be considered as a "good model". It would be highly appreciated if you can elaborate it, or if it has already been mentioned in the paper, please direct me to the part. Thank you!

confusion regarding additional datasets

I am trying to understand the paper. Just clear out my confusion. I am confused whether the newly added datasets are used for training or not, after the optimal hyperparameters are selected.
For example:
For different hyperparameters, the model is first trained on the CUB dataset, then validated on the CUBv2 dataset. Hyperparameters of the model with the highest localization accuracy are then selected. A new model with the selected hyperparameters is then trained on the CUB training dataset along with the added CUBv2 dataset, and then finally tested on the CUB test dataset. Am I correct or are you doing something different?

GPU and training time required by this repo

Dear authors,

Thank you very much for this dedicated reop! This is extremely helpful to the WSOL community!

Some questions:

  1. Are jobs covered by this repo all only require one GPU to train and evaluate? Is it helpful or necessary to use multiple GPUs?
  2. What kind of GPU did you use for the jobs? How much memory did you consume?
  3. How much time to train on each dataset?

Thank you again!

Hyperparameter to reproduce Table 2

Could you also provide the recommended command/hyperparameter settings to reproduce Table 2?

Especially for the learning rate/batch size/epochs etc. for different methods with different backbones on three datasets.

This would benefit to a fair comparison who will cite your work.

Thank you very much!!

Dataset Structure

Thank you for putting together this brilliant collection of WSOL methods and shedding light on the reality of the progress in the field! Truly appreciated!
I was trying to run your code on my own custom dataset following the directions from #17
I was wondering whether the dataset folder hierarchy plays a crucial role in determining class labels. As long as I've correctly produced the class_labels.txt file, do I need to care about the subfolders? For example, I've two labels 0 and 1. Do I need to create two subfolders or putting all the images in one folder should be enough?

optimal oracle value

Hello. Thank you for such interesting work.
I have one question though
I got it that you randomly conducted 30 trials for each method with single backbone.

Then how did you choose the optimal oracle value?
You mentioned that you find it with train-fullsup set.
Do you mean that you have conducted multiple experiments on train-fullsup set to find optimal oracle?

expected str, bytes or os.PathLike object, not NoneType

Getting the above issue after running the following commands

!git clone https://github.com/clovaai/wsolevaluation.git
os.chdir("wsolevaluation")
!bash dataset/prepare_cub.sh

!python main.py --dataset_name CUB
--architecture vgg16
--wsol_method cam
--experiment_name cub_vgg16
--pretrained TRUE
--num_val_sample_per_class 0
--large_feature_map TRUE
--batch_size 32
--epochs 50
--lr 0.000227913316
--lr_decay_frequency 15
--weight_decay 1.00E-04
--override_cache FALSE
--workers 4
--box_v2_metric True
--iou_threshold_list 30 50 70
--eval_checkpoint_type last

Pretrained Resnet

Hi,

I notice that for ResNet-50, you change the stride to 1 at Layer 3 (the name comes from pytorch, torchvision.models.resnet50), in order to increase the feature map from 7x7 to 14x14. So I wonder do you first do this change and then use ImageNet to train the modified ResNet-50, and finally based on this trained version new ResNet-50, you train (or finetune) it using CUB dataset?

calculate MaxBoxAcc

Hi,

I try to calculate the MaxBoxAcc version 1 for CAM method on CUB dataset, using VGG16, the number (~85%) is much higher than the number reported in the paper (~76%).

I calculate it as:

counter = 0
for all testing_image:
        get the current CAM and normalized it via min-max normalization (range to [0, 1])
        for 1,000 steps (I assume you sample the score map threshold 1,000 steps):
                c_CAM = c_CAM >= current_score_map_threshold
                get the current bbox and calculate the IOU
        if one of IOUs > 0.5 (there shall be 1,000 IOUs):
               counter += 1

and the final maxboxacc = counter / number_of_testing_images.

Do I miss something in the procedure? Btw, the bbox is estimated from all contours.

Data split for FSL

Dear the authors,

While trying to reproduce the reported results of few shot learning baseline (FSL), I came up with a question. According to the paper, FSL exploited (10, 5, 5) for imagenet, cub and open images, respectively, and the same amount of supervision was applied to CAM methods.
For FSL, the numer of samples per class e.g., 10 for imagenet, is the sum of samples for train and val I believe since FSL also need some amount of val set. So my question is that for FSL, how did you split the training and val set among the number of samples per class you specified (10, 5, 5)?

Hope my question is clear to you. Looking forward to hearing from you. Thank you!

Optimal threshold for test set

Hi,

I have a question about the evaluation. I've tried to understand how the optimal threshold is used in the test time. It seems you are searching the optimal threshold for test set again instead of using the optimal threshold that is found using validation set. Or am I missing something? Please correct me if I am wrong. I cannot really find the line of code where the optimal threshold is stored for test set. Thank you.

cannot reproduce the results using CAM-Inception on CUB dataset

Hi,

I've set the large_feature_map = True, which means the final feature map used to generate CAMs is 28x28 (image input size is 224x224, rather than 229x229). Also, I've set the LR of SPG_A3_1b, SPG_A3_2b and SPG_A4 10 times higher than the rest blocks (Conv2d_4a_3x3, Conv2d_1a_3x3, Conv2d_2a_3x3, Conv2d_2b_3x3, Conv2d_3b_1x1, Mixed_5b, Mixed_5c, Mixed_5d, Mixed_6a, Mixed_6b, Mixed_6c, Mixed_6d and Mixed_6e), whose learning rate is 0.00224844746. The WD is 5e-4, momentum is 0.9 and nesterov is True for the SGD optimizer.
The LR decay frequency is 15 epoch and I use StepLR scheduler with gamma = 0.1.

The boxaccv2 is around 53%, more specifically, 0.92, 0.56, 0.1 for iou=0.3, 0.5 and 0.7, respectively.

Could you please tell me if I miss something? Or does someone has a similar issue?

About ImageNetV2 file name

Hi,

I downloaded Threshold0.7 of ImageNetV2 to use it as train-fullsup.
However, the file name of the image is not one of 0.jpeg to 9.jpeg, it is in the format like 0af3f1b55de791c4144e2fb6d7dfe96dfc22d3fc.jpeg, 8e1374a4e20d7af22665b7749158b7eb9fa3826e.jpeg, etc.

How can I change the file name to correctly use the box labels you annotated?

Thanks.

?

how much should the position be set in resnet50

Top-1 localization

Can I directly use the model trained by your code to test the top-1 localization ? and could you please provide the implement of the top-1 localization evaluation. Thank you

Sequence of normalization and resize of CAM?

Hi, Thanks for your good work.

I'm a little wondering the line86-line88 in your inference.py.
''
cam_resized = cv2.resize(cam, image_size, interpolation=cv2.INTER_CUBIC)
cam_normalized = normalize_scoremap(cam_resized)
''

In WSOL, after we get a certain class's CAM score of the feature size (hxw, e.g, 7x7), do we resize it to the original image (224x224) and then normalize the score to [0, 1], or do we normalize the CAM score (in the 7x7 shape) to [0, 1] and then resize it to the original image?

I'm looking forward to your reply.

Thanks in advance!

Best,

Re-implementation confusion

Hi, I used your config params to train resnet50 vanilla cam, I cannot reach your reported accuracy.
Here's my configurations:
CUDA_VISIBLE_DEVICES=6 python train.py --dataset_name CUB --architecture resnet50 --wsol_method cam --experiment_name CUB_CAM_resnet50_box_v2_metric --pretrained TRUE --large_feature_map FALSE --batch_size 32 --epochs 50 --lr 0.0002 --lr_decay_frequency 15 --weight_decay 0.0001 --override_cache TRUE --workers 4 --box_v2_metric True --iou_threshold_list 30 50 70 --eval_checkpoint_type last --data_root /data/lijinlong/datasets/CUB-200-2011/
result:

Final epoch evaluation on test set ...
Check train_log/CUB_CAM_resnet50_box_v2_metric/last_checkpoint.pth.tar loaded.
rank 0, Evaluate epoch 50, split test
Computing and evaluating cams.
Split train, metric loss, current value: 0.07756523653730615
Split train, metric loss, best value: 0.07533820327974217
Split train, metric loss, best epoch: 48
Split train, metric classification, current value: 99.84984984984985
Split train, metric classification, best value: 99.88321654988322
Split train, metric classification, best epoch: 43
Split val, metric classification, current value: 72.89999999999999
Split val, metric classification, best value: 74.2
Split val, metric classification, best epoch: 30
Split val, metric localization, current value: 46.36666666666667
Split val, metric localization, best value: 50.900000000000006
Split val, metric localization, best epoch: 1
Split val, metric localization_IOU_30, current value: 89.1
Split val, metric localization_IOU_30, best value: 92.6
Split val, metric localization_IOU_30, best epoch: 2
Split val, metric localization_IOU_50, current value: 43.9
Split val, metric localization_IOU_50, best value: 51.6
Split val, metric localization_IOU_50, best epoch: 1
Split val, metric localization_IOU_70, current value: 6.1
Split val, metric localization_IOU_70, best value: 8.9
Split val, metric localization_IOU_70, best epoch: 1
Split test, metric classification, current value: 77.06247842595789
Split test, metric localization, current value: 51.26567713726845
Split test, metric localization_IOU_30, current value: 95.11563686572316
Split test, metric localization_IOU_50, current value: 50.465999309630654
Split test, metric localization_IOU_70, current value: 8.215395236451501

CUDA_VISIBLE_DEVICES=5 python train.py --dataset_name CUB --architecture resnet50 --wsol_method cam --experiment_name CUB_CAM_resnet50 --pretrained TRUE --large_feature_map FALSE --batch_size 32 --epochs 50 --lr 0.0002 --lr_decay_frequency 15 --weight_decay 0.0001 --override_cache TRUE --workers 4 --box_v2_metric False --iou_threshold_list 30 50 70 --eval_checkpoint_type last --data_root /data/lijinlong/datasets/CUB-200-2011/
results:

Final epoch evaluation on test set ...
Check train_log/CUB_CAM_resnet50/last_checkpoint.pth.tar loaded.
rank 0, Evaluate epoch 50, split test
Computing and evaluating cams.
Split train, metric loss, current value: 0.078823547021007
Split train, metric loss, best value: 0.07638261178592304
Split train, metric loss, best epoch: 45
Split train, metric classification, current value: 99.76643309976645
Split train, metric classification, best value: 99.83316649983317
Split train, metric classification, best epoch: 43
Split val, metric classification, current value: 73.2
Split val, metric classification, best value: 74.0
Split val, metric classification, best epoch: 18
Split val, metric localization, current value: 43.5
Split val, metric localization, best value: 52.6
Split val, metric localization, best epoch: 1
Split val, metric localization_IOU_30, current value: 88.6
Split val, metric localization_IOU_30, best value: 93.2
Split val, metric localization_IOU_30, best epoch: 1
Split val, metric localization_IOU_50, current value: 43.5
Split val, metric localization_IOU_50, best value: 52.6
Split val, metric localization_IOU_50, best epoch: 1
Split val, metric localization_IOU_70, current value: 6.0
Split val, metric localization_IOU_70, best value: 9.4
Split val, metric localization_IOU_70, best epoch: 1
Split test, metric classification, current value: 76.61373835001726
Split test, metric localization, current value: 50.84570245081118
Split test, metric localization_IOU_30, current value: 95.11563686572316
Split test, metric localization_IOU_50, current value: 50.84570245081118
Split test, metric localization_IOU_70, current value: 8.439765274421816

Here is my model architecture and config params:
Namespace(acol_threshold=0.7, adl_drop_rate=0.75, adl_threshold=0.9, architecture='resnet50', architecture_type='cam', batch_size=32, box_v2_metric=True, cam_curve_interval=0.001, crop_size=224, cutmix_beta=1.0, cutmix_prob=1.0, data_paths=Munch({'train': '/data/lijinlong/datasets/CUB-200-2011/CUB', 'val': '/data/lijinlong/datasets/CUB-200-2011/CUB', 'test': '/data/lijinlong/datasets/CUB-200-2011/CUB'}), data_root='/data/lijinlong/datasets/CUB-200-2011/', dataset_name='CUB', dist_backend='nccl', dist_url='tcp://127.0.0.1', epochs=50, eval_checkpoint_type='last', experiment_name='CUB_CAM_resnet50_box_v2_metric', gpu=None, has_drop_rate=0.5, has_grid_size=4, iou_threshold_list=[30, 50, 70], large_feature_map=False, launcher='pytorch', local_rank=0, log_folder='train_log/CUB_CAM_resnet50_box_v2_metric', lr=0.0002, lr_classifier_ratio=10, lr_decay_frequency=15, mask_root='dataset/OpenImages', master_port='47562', metadata_root='metadata/CUB', momentum=0.9, multi_contour_eval=True, multi_iou_eval=True, multiprocessing_distributed=False, num_val_sample_per_class=0, override_cache=True, pretrained=True, pretrained_path=None, proxy_training_set=False, rank=-1, reporter=<class 'util.Reporter'>, reporter_log_root='train_log/CUB_CAM_resnet50_box_v2_metric/reports', resize_size=256, scoremap_paths=Munch({'train': 'train_log/CUB_CAM_resnet50_box_v2_metric/scoremaps/train', 'val': 'train_log/CUB_CAM_resnet50_box_v2_metric/scoremaps/val', 'test': 'train_log/CUB_CAM_resnet50_box_v2_metric/scoremaps/test'}), seed=None, spg_threshold_1h=0.7, spg_threshold_1l=0.01, spg_threshold_2h=0.5, spg_threshold_2l=0.05, spg_threshold_3h=0.7, spg_threshold_3l=0.1, spg_thresholds=((0.7, 0.01), (0.5, 0.05), (0.7, 0.1)), weight_decay=0.0001, workers=4, world_size=-1, wsol_method='cam') Loading model resnet50
`
ResNetCam(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=200, bias=True)
)

IOU 50 70 are much lower than your report, do I missed something??
But for vgg16, there're ok.
Thanks.

Interpretation of the result

Hi,
Thank you for providing the code for this amazing work. I have a question for which I seek your guidance. When I run the basic resnet50 code for CAM on CUB200, I get the following results on the test set

Split test, metric classification, current value: 50.517777010700726
Split test, metric localization, current value: 58.97480151881257
Split test, metric localization_IOU_30, current value: 96.08215395236452
Split test, metric localization_IOU_50, current value: 66.27545736969279
Split test, metric localization_IOU_70, current value: 14.566793234380393

I wanted to confirm whether metric localization corresponds to the MaxBoxAcc-v2 metric that you rmention in your work? Also what does metric localization, current value mean?. In Table 6 of your paper, do you report metric localization_IOU_50? Looking forward to hearing from you.
Thanks

About dataset

I can not obtain the CUBV2 from the prepare_cub.sh.Could you please check the link valid?

FSL baseline

could you please release the code of fsl, thank you

Cant see data for evaluation.py

I cant find this folder train_log/scoremaps/ for CUB dataset. I am trying to use your evaluation only script. Could you please tell me the procedure to run this code on heatmaps. How do we generate or run your evaluation.py on custom object detection datasets?

slow cv2.findContours

hi,
cv2.findContours is extremely slow depending on the quality of the cam (from .00005s to .001s). this brings the validation time from 2mins to 12mins easily. things get worse when the cam is bad (way too many contours per threshold: >1000/threshold).

contours = cv2.findContours(

is there a way to speed it up whout breaking the evaluation protocol?
i really appreciate your help.
thanks

Configuration of reproducing the result in the CUB and ImageNet

Thanks first for the great work!

Following the demo configuration of OpenImages in README.MD, I could easily reproduce the expected accuracy in the given table.

python main.py --dataset_name OpenImages \
               --architecture vgg16 \
               --wsol_method cam \
               --experiment_name OpenImages_vgg16_CAM \
               --pretrained TRUE \
               --num_val_sample_per_class 5 \
               --large_feature_map FALSE \
               --batch_size 32 \
               --epochs 10 \
               --lr 0.00227913316 \
               --lr_decay_frequency 3 \
               --weight_decay 5.00E-04 \
               --override_cache FALSE \
               --workers 4 \
               --box_v2_metric True \
               --iou_threshold_list 30 50 70 \
               --eval_checkpoint_type last

May I ask what is the configuration to CUB and ImageNet datase?
I use the above one (only change --dataset_name ), the experiment accuracy is much lower than reported as shown in the given.

Many thanks!

Test Dataset description

The test dataset for this task describes that the test data contains images with full supervision, but the datasets taken for test in Imagnet, CUB and open images dont have bounding boxes or masks. Could yo please explain this in detail?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.