clovaai / wsolevaluation Goto Github PK
View Code? Open in Web Editor NEWEvaluating Weakly Supervised Object Localization Methods Right (CVPR 2020)
License: MIT License
Evaluating Weakly Supervised Object Localization Methods Right (CVPR 2020)
License: MIT License
I needed to evaluate a couple of methods in a fully weakly supervised way without any bbox annotation using the original CVPR 16 metric. Is it possible with this code ?
hi,
in https://arxiv.org/pdf/2001.07437.pdf , the pxap performance changed between using maxboxacc (tab.2) and using maxboxaccv2 (tab.8) over openimages. i didnt expect that.
why is that?
because the validation (for model selection) on openimages is done using pxap (as for test as well), using maxboxacc or maxboxaccv2 in the code configuration should not impact the results on pxap, right?
or did you run a new 30 random trials for hyper-parameters search that led to different best hyper-parameters for pxap tab.8? or simply due to randomness if the code is not reproducible (running the same experiment twice with the same settings does not lead to the exact same results...)?
thanks
Hi, I wanted to ask about the argument num_val_sample_per_class
in
python main.py --dataset_name OpenImages \ --architecture vgg16 \ --wsol_method cam \ --experiment_name OpenImages_vgg16_CAM \ --pretrained TRUE \ --num_val_sample_per_class 5 \ --large_feature_map FALSE \ --batch_size 32 \ --epochs 10 \ --lr 0.00227913316 \ --lr_decay_frequency 3 \ --weight_decay 5.00E-04 \ --override_cache FALSE \ --workers 4 \ --box_v2_metric True \ --iou_threshold_list 30 50 70 \ --eval_checkpoint_type last
You set it to 5 for OpenImage dataset. It shouldn't be 25 instead because we have 25 sample per class for the validation set ?
So when we will use your code with CUB and ImageNet datasets to which number should we set the argument num_val_sample_per_class
?
Thank you in advance for you replay :)
Would you please share your code about FSL baseline?
When generate cam, you use the ground truth label to get the channel weight of fully connected layer. But I think the predict class of model is the right choice.
I downloaded the CUB dataset using bash wsolevaluation/dataset/prepare_cub.sh
comand
Now, I am getting the error mentioned after running the following commands:
!python evaluation.py --scoremap_root=train_log/scoremaps/
--metadata_root=metadata/CUB/test
--mask_root=dataset/
--dataset_name=CUB
--split=val
--cam_curve_interval=0.01
I used the bBox annotation information provided by you for testing, and found many wrong Bboxes.
for example:
100/5.jpeg,73,214,148,299
100/5.jpeg,180,85,200,111
100/5.jpeg,169,137,231,195
100/5.jpeg,394,163,462,207
100/5.jpeg,316,134,374,160
100/5.jpeg,136,206,171,235
100/5.jpeg,242,146,267,164
Hi.
Thank you for such a clean repository. I would like to ask if it's possible to have access to a few of the pre-trained models. As many would use your repository to reproduce the current SOTA, having access to pre-trained models would indeed speed up the process of testing code validity. I think just having access to a pretrained model on CUB would make a huge difference because everyone can basically download it and validate the accuracies on a local machine (even without GPU).
Many thanks.
The code in the data loader reads the image_ids file as: which has.jpg appended
with open(metadata['image_ids' + suffix]) as f:
for line in f.readlines():
image_ids.append(line.strip('\n'))
but while reading open Images data, and reading the localization.txt, it expects. path_file.jpg.npy. isnt it wrong? should it not be path_file.npy. The jpg should be dropped. It would be great if you could clear this issue.
Hi,
Traceback (most recent call last):
File "evaluation_test.py", line 98, in test_compute_bboxes_from_scoremaps_degenerate
self.assertListEqual(boxes, [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0],
AssertionError: First sequence is not a list: ([array([[0, 0, 0, 0]]), array([[0, 0, 0, 0]]), array([[0, 0, 0, 0]]), array([[0, 0, 0, 0]]), array([[0, 0, 0, 0]])], [1, 1, 1, 1, 1])
Traceback (most recent call last):
File "evaluation_test.py", line 125, in test_compute_bboxes_from_scoremaps_multimodal
self.assertListEqual(boxes, [[0, 0, 4, 3],
AssertionError: First sequence is not a list: ([array([[0, 0, 4, 3]]), array([[0, 0, 2, 2]]), array([[0, 3, 3, 3]]), array([[2, 3, 3, 3]]), array([[0, 3, 1, 3]])], [1, 1, 1, 1, 1])
Traceback (most recent call last):
File "evaluation_test.py", line 110, in test_compute_bboxes_from_scoremaps_unimodal
self.assertListEqual(boxes, [[1, 1, 4, 3],
AssertionError: First sequence is not a list: ([array([[1, 1, 4, 3]]), array([[1, 1, 4, 3]]), array([[2, 1, 4, 3]]), array([[2, 2, 4, 3]]), array([[2, 2, 3, 3]])], [1, 1, 1, 1, 1])
(2) My second problem is when I run your suggested script: python evaluation.py --scoremap_root=train_log/scoremaps/ --metadata_root=metadata/ --mask_root=dataset/ --dataset_name=CUB --split=val --cam_curve_interval=0.01
It gives the following error:
Loading and evaluating cams.
Traceback (most recent call last):
File "evaluation.py", line 528, in
main()
File "evaluation.py", line 516, in main
evaluate_wsol(scoremap_root=args.scoremap_root,
File "evaluation.py", line 465, in evaluate_wsol
image_ids = get_image_ids(metadata)
File "/egundogdu/WSOL/wsolevaluation/data_loaders.py", line 62, in get_image_ids
with open(metadata['image_ids' + suffix]) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'metadata/image_ids.txt'
Do you have any idea with these issues?
Can you please provide any insights on why you crop images in 224x224 patches by default and why you resize them to 256x256, regardless of input image size?
Hi! Thank you for your work in bringing new benchmarks and a unified approach to evaluation for the WSOL community.
In the CVPR2020 paper, some tables you provided seem to be inconsistent with those in https://docs.google.com/spreadsheets/d/1O4gu69FOOooPoTTtAEmFdfjs2K0EtFneYWQFk8rNqzw/edit#gid=0. For example, the result of Inceptionv3 with ACoL in the paper is 63.0, but the form indicated by the web link page is lower. Is there something wrong with the way I use it? Looking forward to your reply!
sampled_indices = np.random.choice( indices, self.num_sample_per_class, replace=False)
........
........
File "mtrand.pyx", line 946, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'
Looking back to the config.py, the recommended setting is 'args.num_val_sample_per_class<=5' for CUB.
def check_dependency(args):
if args.dataset_name == 'CUB':
if args.num_val_sample_per_class >= 6:
raise ValueError("num-val-sample must be <= 5 for CUB.")
if args.dataset_name == 'OpenImages':
if args.num_val_sample_per_class >= 26:
raise ValueError("num-val-sample must be <= 25 for OpenImages.")
However, the bird category '059.California_Gull' only contains 3 images for validation while '002.Laysan_Albatross' and '007.Parakeet_Auklet' contains 6 images each. So this leads to the error above.
The solution is to expand the 059.California_Gull number from 3 to 5, or change the recommended num_val_sample_per_class=3. Feel strange there is no one meet with this problem before. 👍
Getting the mentioned error by running this command
!python main.py --dataset_name CUB
--architecture vgg16
--wsol_method cam
--experiment_name CUB_vgg16_CAM
--pretrained TRUE
--num_val_sample_per_class 5
--large_feature_map FALSE
--batch_size 32
--epochs 10
--lr 0.00227913316
--lr_decay_frequency 3
--weight_decay 5.00E-04
--override_cache FALSE
--workers 4
--multi_iou_eval False
--iou_threshold_list 30 50 70
--multi_contour_eval False
--eval_checkpoint_type best
Cloud you throw some light on how to modify the code for other custom datasets?
I want to use my own dataset instead of using ImageNet or CUB or OpenImages.
I have dataset in VOC format
After training the model, the scoremaps folders are empty! Can you suggest me what might I be missing ?
Hi,
Thank you for sharing the code for the awesome paper!
I was trying to reproduce the results shown in Table 6 of the appendix but have had a hard time to reach the performance you reported. Would it possible for you to share the configs for CAM, HaS, ACoL and and SPG for V, I and R?
Also, I was wondering if there is a specific reason why you proposed to use the max of GT-known instead of the max of Top1-Loc. According to my understanding, Top1-Loc is more comprehensive metric since it also takes classification performance into account. Although localization is important, if a prediction on classification is wrong in the first place, a model would not be considered as a "good model". It would be highly appreciated if you can elaborate it, or if it has already been mentioned in the paper, please direct me to the part. Thank you!
I am trying to understand the paper. Just clear out my confusion. I am confused whether the newly added datasets are used for training or not, after the optimal hyperparameters are selected.
For example:
For different hyperparameters, the model is first trained on the CUB dataset, then validated on the CUBv2 dataset. Hyperparameters of the model with the highest localization accuracy are then selected. A new model with the selected hyperparameters is then trained on the CUB training dataset along with the added CUBv2 dataset, and then finally tested on the CUB test dataset. Am I correct or are you doing something different?
Dear authors,
Thank you very much for this dedicated reop! This is extremely helpful to the WSOL community!
Some questions:
Thank you again!
Could you also provide the recommended command/hyperparameter settings to reproduce Table 2?
Especially for the learning rate/batch size/epochs etc. for different methods with different backbones on three datasets.
This would benefit to a fair comparison who will cite your work.
Thank you very much!!
Thank you for putting together this brilliant collection of WSOL methods and shedding light on the reality of the progress in the field! Truly appreciated!
I was trying to run your code on my own custom dataset following the directions from #17
I was wondering whether the dataset folder hierarchy plays a crucial role in determining class labels. As long as I've correctly produced the class_labels.txt
file, do I need to care about the subfolders? For example, I've two labels 0
and 1
. Do I need to create two subfolders or putting all the images in one folder should be enough?
Hello. Thank you for such interesting work.
I have one question though
I got it that you randomly conducted 30 trials for each method with single backbone.
Then how did you choose the optimal oracle value?
You mentioned that you find it with train-fullsup set.
Do you mean that you have conducted multiple experiments on train-fullsup set to find optimal oracle?
Getting the above issue after running the following commands
!git clone https://github.com/clovaai/wsolevaluation.git
os.chdir("wsolevaluation")
!bash dataset/prepare_cub.sh
!python main.py --dataset_name CUB
--architecture vgg16
--wsol_method cam
--experiment_name cub_vgg16
--pretrained TRUE
--num_val_sample_per_class 0
--large_feature_map TRUE
--batch_size 32
--epochs 50
--lr 0.000227913316
--lr_decay_frequency 15
--weight_decay 1.00E-04
--override_cache FALSE
--workers 4
--box_v2_metric True
--iou_threshold_list 30 50 70
--eval_checkpoint_type last
Hi,
I notice that for ResNet-50, you change the stride to 1 at Layer 3 (the name comes from pytorch, torchvision.models.resnet50), in order to increase the feature map from 7x7 to 14x14. So I wonder do you first do this change and then use ImageNet to train the modified ResNet-50, and finally based on this trained version new ResNet-50, you train (or finetune) it using CUB dataset?
Hi,
I try to calculate the MaxBoxAcc version 1 for CAM method on CUB dataset, using VGG16, the number (~85%) is much higher than the number reported in the paper (~76%).
I calculate it as:
counter = 0
for all testing_image:
get the current CAM and normalized it via min-max normalization (range to [0, 1])
for 1,000 steps (I assume you sample the score map threshold 1,000 steps):
c_CAM = c_CAM >= current_score_map_threshold
get the current bbox and calculate the IOU
if one of IOUs > 0.5 (there shall be 1,000 IOUs):
counter += 1
and the final maxboxacc = counter / number_of_testing_images.
Do I miss something in the procedure? Btw, the bbox is estimated from all contours.
Thanks for the great work.
This is not directly related to the codebase — I found a very minor consistency in the paper. I am looking at Lemma 3.1 in arxiv v2.
You probably meant M_{bg}
instead of M_{bf}
, right?
Dear the authors,
While trying to reproduce the reported results of few shot learning baseline (FSL), I came up with a question. According to the paper, FSL exploited (10, 5, 5) for imagenet, cub and open images, respectively, and the same amount of supervision was applied to CAM methods.
For FSL, the numer of samples per class e.g., 10 for imagenet, is the sum of samples for train and val I believe since FSL also need some amount of val set. So my question is that for FSL, how did you split the training and val set among the number of samples per class you specified (10, 5, 5)?
Hope my question is clear to you. Looking forward to hearing from you. Thank you!
Line 324 in e00842f
In this code, this is causing issue: TypeError: Expected Ptrcv::UMat for argument 'src'
when there is a mask path but no ignore file.
Could you detail more on this.
Hi,
I have a question about the evaluation. I've tried to understand how the optimal threshold is used in the test time. It seems you are searching the optimal threshold for test set again instead of using the optimal threshold that is found using validation set. Or am I missing something? Please correct me if I am wrong. I cannot really find the line of code where the optimal threshold is stored for test set. Thank you.
Hi,
I've set the large_feature_map = True, which means the final feature map used to generate CAMs is 28x28 (image input size is 224x224, rather than 229x229). Also, I've set the LR of SPG_A3_1b, SPG_A3_2b and SPG_A4 10 times higher than the rest blocks (Conv2d_4a_3x3, Conv2d_1a_3x3, Conv2d_2a_3x3, Conv2d_2b_3x3, Conv2d_3b_1x1, Mixed_5b, Mixed_5c, Mixed_5d, Mixed_6a, Mixed_6b, Mixed_6c, Mixed_6d and Mixed_6e), whose learning rate is 0.00224844746. The WD is 5e-4, momentum is 0.9 and nesterov is True for the SGD optimizer.
The LR decay frequency is 15 epoch and I use StepLR scheduler with gamma = 0.1.
The boxaccv2 is around 53%, more specifically, 0.92, 0.56, 0.1 for iou=0.3, 0.5 and 0.7, respectively.
Could you please tell me if I miss something? Or does someone has a similar issue?
As mentioned in the paper, where is the code for hyperparameter search?
Hi,
I downloaded Threshold0.7
of ImageNetV2
to use it as train-fullsup
.
However, the file name of the image is not one of 0.jpeg
to 9.jpeg
, it is in the format like 0af3f1b55de791c4144e2fb6d7dfe96dfc22d3fc.jpeg
, 8e1374a4e20d7af22665b7749158b7eb9fa3826e.jpeg
, etc.
How can I change the file name to correctly use the box labels you annotated?
Thanks.
how much should the position be set in resnet50
Can I directly use the model trained by your code to test the top-1 localization ? and could you please provide the implement of the top-1 localization evaluation. Thank you
Hi, Thanks for your good work.
I'm a little wondering the line86-line88 in your inference.py.
''
cam_resized = cv2.resize(cam, image_size, interpolation=cv2.INTER_CUBIC)
cam_normalized = normalize_scoremap(cam_resized)
''
In WSOL, after we get a certain class's CAM score of the feature size (hxw, e.g, 7x7), do we resize it to the original image (224x224) and then normalize the score to [0, 1], or do we normalize the CAM score (in the 7x7 shape) to [0, 1] and then resize it to the original image?
I'm looking forward to your reply.
Thanks in advance!
Best,
Hi, I used your config params to train resnet50 vanilla cam, I cannot reach your reported accuracy.
Here's my configurations:
CUDA_VISIBLE_DEVICES=6 python train.py --dataset_name CUB --architecture resnet50 --wsol_method cam --experiment_name CUB_CAM_resnet50_box_v2_metric --pretrained TRUE --large_feature_map FALSE --batch_size 32 --epochs 50 --lr 0.0002 --lr_decay_frequency 15 --weight_decay 0.0001 --override_cache TRUE --workers 4 --box_v2_metric True --iou_threshold_list 30 50 70 --eval_checkpoint_type last --data_root /data/lijinlong/datasets/CUB-200-2011/
result:
Final epoch evaluation on test set ...
Check train_log/CUB_CAM_resnet50_box_v2_metric/last_checkpoint.pth.tar loaded.
rank 0, Evaluate epoch 50, split test
Computing and evaluating cams.
Split train, metric loss, current value: 0.07756523653730615
Split train, metric loss, best value: 0.07533820327974217
Split train, metric loss, best epoch: 48
Split train, metric classification, current value: 99.84984984984985
Split train, metric classification, best value: 99.88321654988322
Split train, metric classification, best epoch: 43
Split val, metric classification, current value: 72.89999999999999
Split val, metric classification, best value: 74.2
Split val, metric classification, best epoch: 30
Split val, metric localization, current value: 46.36666666666667
Split val, metric localization, best value: 50.900000000000006
Split val, metric localization, best epoch: 1
Split val, metric localization_IOU_30, current value: 89.1
Split val, metric localization_IOU_30, best value: 92.6
Split val, metric localization_IOU_30, best epoch: 2
Split val, metric localization_IOU_50, current value: 43.9
Split val, metric localization_IOU_50, best value: 51.6
Split val, metric localization_IOU_50, best epoch: 1
Split val, metric localization_IOU_70, current value: 6.1
Split val, metric localization_IOU_70, best value: 8.9
Split val, metric localization_IOU_70, best epoch: 1
Split test, metric classification, current value: 77.06247842595789
Split test, metric localization, current value: 51.26567713726845
Split test, metric localization_IOU_30, current value: 95.11563686572316
Split test, metric localization_IOU_50, current value: 50.465999309630654
Split test, metric localization_IOU_70, current value: 8.215395236451501
CUDA_VISIBLE_DEVICES=5 python train.py --dataset_name CUB --architecture resnet50 --wsol_method cam --experiment_name CUB_CAM_resnet50 --pretrained TRUE --large_feature_map FALSE --batch_size 32 --epochs 50 --lr 0.0002 --lr_decay_frequency 15 --weight_decay 0.0001 --override_cache TRUE --workers 4 --box_v2_metric False --iou_threshold_list 30 50 70 --eval_checkpoint_type last --data_root /data/lijinlong/datasets/CUB-200-2011/
results:
Final epoch evaluation on test set ...
Check train_log/CUB_CAM_resnet50/last_checkpoint.pth.tar loaded.
rank 0, Evaluate epoch 50, split test
Computing and evaluating cams.
Split train, metric loss, current value: 0.078823547021007
Split train, metric loss, best value: 0.07638261178592304
Split train, metric loss, best epoch: 45
Split train, metric classification, current value: 99.76643309976645
Split train, metric classification, best value: 99.83316649983317
Split train, metric classification, best epoch: 43
Split val, metric classification, current value: 73.2
Split val, metric classification, best value: 74.0
Split val, metric classification, best epoch: 18
Split val, metric localization, current value: 43.5
Split val, metric localization, best value: 52.6
Split val, metric localization, best epoch: 1
Split val, metric localization_IOU_30, current value: 88.6
Split val, metric localization_IOU_30, best value: 93.2
Split val, metric localization_IOU_30, best epoch: 1
Split val, metric localization_IOU_50, current value: 43.5
Split val, metric localization_IOU_50, best value: 52.6
Split val, metric localization_IOU_50, best epoch: 1
Split val, metric localization_IOU_70, current value: 6.0
Split val, metric localization_IOU_70, best value: 9.4
Split val, metric localization_IOU_70, best epoch: 1
Split test, metric classification, current value: 76.61373835001726
Split test, metric localization, current value: 50.84570245081118
Split test, metric localization_IOU_30, current value: 95.11563686572316
Split test, metric localization_IOU_50, current value: 50.84570245081118
Split test, metric localization_IOU_70, current value: 8.439765274421816
Here is my model architecture and config params:
Namespace(acol_threshold=0.7, adl_drop_rate=0.75, adl_threshold=0.9, architecture='resnet50', architecture_type='cam', batch_size=32, box_v2_metric=True, cam_curve_interval=0.001, crop_size=224, cutmix_beta=1.0, cutmix_prob=1.0, data_paths=Munch({'train': '/data/lijinlong/datasets/CUB-200-2011/CUB', 'val': '/data/lijinlong/datasets/CUB-200-2011/CUB', 'test': '/data/lijinlong/datasets/CUB-200-2011/CUB'}), data_root='/data/lijinlong/datasets/CUB-200-2011/', dataset_name='CUB', dist_backend='nccl', dist_url='tcp://127.0.0.1', epochs=50, eval_checkpoint_type='last', experiment_name='CUB_CAM_resnet50_box_v2_metric', gpu=None, has_drop_rate=0.5, has_grid_size=4, iou_threshold_list=[30, 50, 70], large_feature_map=False, launcher='pytorch', local_rank=0, log_folder='train_log/CUB_CAM_resnet50_box_v2_metric', lr=0.0002, lr_classifier_ratio=10, lr_decay_frequency=15, mask_root='dataset/OpenImages', master_port='47562', metadata_root='metadata/CUB', momentum=0.9, multi_contour_eval=True, multi_iou_eval=True, multiprocessing_distributed=False, num_val_sample_per_class=0, override_cache=True, pretrained=True, pretrained_path=None, proxy_training_set=False, rank=-1, reporter=<class 'util.Reporter'>, reporter_log_root='train_log/CUB_CAM_resnet50_box_v2_metric/reports', resize_size=256, scoremap_paths=Munch({'train': 'train_log/CUB_CAM_resnet50_box_v2_metric/scoremaps/train', 'val': 'train_log/CUB_CAM_resnet50_box_v2_metric/scoremaps/val', 'test': 'train_log/CUB_CAM_resnet50_box_v2_metric/scoremaps/test'}), seed=None, spg_threshold_1h=0.7, spg_threshold_1l=0.01, spg_threshold_2h=0.5, spg_threshold_2l=0.05, spg_threshold_3h=0.7, spg_threshold_3l=0.1, spg_thresholds=((0.7, 0.01), (0.5, 0.05), (0.7, 0.1)), weight_decay=0.0001, workers=4, world_size=-1, wsol_method='cam') Loading model resnet50
`
ResNetCam(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=200, bias=True)
)
IOU 50 70 are much lower than your report, do I missed something??
But for vgg16, there're ok.
Thanks.
Hi,
Thank you for providing the code for this amazing work. I have a question for which I seek your guidance. When I run the basic resnet50 code for CAM on CUB200, I get the following results on the test set
Split test, metric classification, current value: 50.517777010700726
Split test, metric localization, current value: 58.97480151881257
Split test, metric localization_IOU_30, current value: 96.08215395236452
Split test, metric localization_IOU_50, current value: 66.27545736969279
Split test, metric localization_IOU_70, current value: 14.566793234380393
I wanted to confirm whether metric localization
corresponds to the MaxBoxAcc-v2
metric that you rmention in your work? Also what does metric localization, current value
mean?. In Table 6 of your paper, do you report metric localization_IOU_50? Looking forward to hearing from you.
Thanks
I can not obtain the CUBV2 from the prepare_cub.sh.Could you please check the link valid?
could you please release the code of fsl, thank you
I cant find this folder train_log/scoremaps/ for CUB dataset. I am trying to use your evaluation only script. Could you please tell me the procedure to run this code on heatmaps. How do we generate or run your evaluation.py on custom object detection datasets?
hi,
cv2.findContours
is extremely slow depending on the quality of the cam (from .00005s to .001s). this brings the validation time from 2mins to 12mins easily. things get worse when the cam is bad (way too many contours per threshold: >1000/threshold).
Line 140 in e00842f
is there a way to speed it up whout breaking the evaluation protocol?
i really appreciate your help.
thanks
Thanks first for the great work!
Following the demo configuration of OpenImages in README.MD, I could easily reproduce the expected accuracy in the given table.
python main.py --dataset_name OpenImages \
--architecture vgg16 \
--wsol_method cam \
--experiment_name OpenImages_vgg16_CAM \
--pretrained TRUE \
--num_val_sample_per_class 5 \
--large_feature_map FALSE \
--batch_size 32 \
--epochs 10 \
--lr 0.00227913316 \
--lr_decay_frequency 3 \
--weight_decay 5.00E-04 \
--override_cache FALSE \
--workers 4 \
--box_v2_metric True \
--iou_threshold_list 30 50 70 \
--eval_checkpoint_type last
May I ask what is the configuration to CUB and ImageNet datase?
I use the above one (only change --dataset_name ), the experiment accuracy is much lower than reported as shown in the given.
Many thanks!
The test dataset for this task describes that the test data contains images with full supervision, but the datasets taken for test in Imagnet, CUB and open images dont have bounding boxes or masks. Could yo please explain this in detail?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.