sandipan211 / zsd-sc-resolver Goto Github PK

Resolving semantic confusions for improved zero-shot detection (BMVC 2022)

License: MIT License

Python 59.87% Shell 0.18% Cython 5.44% C 7.54% C++ 0.61% Jupyter Notebook 25.26% Dockerfile 0.01% Makefile 0.01% Batchfile 0.01% Cuda 1.08%

computer-vision conditional-gan deep-learning faster-rcnn multi-modal-learning object-detection pytorch-implementation triplet-loss zero-shot-learning zero-shot-object-detection

zsd-sc-resolver's Introduction

Resolving Semantic Confusions for Improved Zero-Shot Detection (BMVC Oral Presentation, 2022)

👓 At a glance

This repository contains the official PyTorch implementation of our BMVC 2022 paper : Resolving Semantic Confusions for Improved Zero-Shot Detection, a work done by Sandipan Sarma, Sushil Kumar and Arijit Sur at Indian Institute of Technology Guwahati.

Supervised deep learning-based object detection models like Faster-RCNN and YOLO have seen tremendous success in the last decade or so, but are limited by the availability of large-scale annotated datasets, failure to recognize the changing object appearances over time, and ability to detect unseen objects.
Zero-shot detection (ZSD) is a challenging task where we aim to recognize and localize objects simultaneously, even when our model has not been trained with visual samples of a few target (“unseen”) classes. This is achieved via knowledge transfer from the seen to unseen classes using semantics (attributes) of the object classes as a bridge.
Semantic confusion: Knowledge transfer in existing ZSD models is not discriminative enough to differentiate between objects with similar semantics, e.g. car and train.
We propose a generative approach and introduced triplet loss during feature generation to account for inter-class dissimilarity.
Moreover, we show that maintaining cyclic consistency between the generated visual features and their class semantics is helpful for improving the quality of the generated features.
Addressed problems such as high false positive rate and misclassification of localized objects by resolving semantic confusion, and comprehensively beat the state-of-the-art methods.

The primary novelty of our model lies in the incorporation of triplet loss based on visual features, assisted by a cyclic-consistency loss

📰 News

Added code for obtaining detection results on a custom image. See our updated step 6.
Uploaded instructions for applying our method on custom datasets.
New definitions uploaded for explaining the script hyperparameters.
The configuration file for setting up the detection pipelines in the case of MS-COCO is mmdetection/configs/faster_rcnn_r101_fpn_1x.py. For PASCAL-VOC, always replace it with mmdetection/configs/pascal_voc/faster_rcnn_r101_fpn_1x_voc0712.py wherever you encounter any argument for the config path.
See step 6 for hard-coded arguments during evaluation.

📹 Video

This paper was presented at the BMVC Orals, 2022.

🚄 Training the model

1. 🏢 Creating the work environment

Our code is based on PyTorch and has been implemented using an NVIDIA V100 32 GB DGX Station, with mmdetection as the base framework for object detection, which contains a Faster-RCNN implementation. Install Anaconda/Miniconda on your system and create a conda environment using the following command:

conda env create -f zsd_environment.yml

Once set up, activate the environment and do the following:

cd ./mmdetection/

# install mmdetection and bind it to your project
python setup.py develop

Following commands are being shown for MSCOCO dataset. For PASCAL-VOC dataset, make the appropriate changes to the command line arguments and run the appropriate scripts.

2. ⏳ Train Faster-RCNN detector on seen data

All the configurations regarding training and testing pipelines are stored in a configuration file. To access it and make changes in it, find the file using:

cd ./mmdetection/configs/faster_rcnn_r101_fpn_1x.py

In zero-shot detection, the object categories in a dataset are split into two sets - seen and unseen. Such sets are defined in previous works for both MSCOCO [1] and PASCAL-VOC [2] datasets. The splits can be found in splits.py.

To train the Faster-RCNN on seen data, run:

cd ./mmdetection
./tools/dist_train.sh configs/faster_rcnn_r101_fpn_1x.py 1 --validate

For reproducibility, it is recommended to use the pre-trained model given below in this repository. It is important to create a directory named work_dirs inside mmdetection folder, where there should be separate directories for MSCOCO and PASCAL-VOC, inside which the weights of the trained Faster-RCNN should be stored. For our pre-trained models, we name them as epoch_12.pth and epoch_4.pth after training Faster-RCNN on seen data of MSCOCO and PASCAL-VOC datasets respectively.

The pre-trained weights of Faster-RCNN are stored with the ResNet-101 (backbone CNN) being pre-trained only after removing the overlapping classes from ImageNet [3]. This pre-trained ResNet is given here, and weights of Faster-RCNN are uploaded both for PASCAL-VOC and MSCOCO.

3. 📤 Extract object features

Inside the data folder, MSCOCO and PASCAL-VOC image datasets should be stored in appropriate formats, before running the following:

cd ./mmdetection
python tools/zero_shot_utils.py configs/faster_rcnn_r101_fpn_1x.py --classes seen --load_from ./work_dirs/coco2014/epoch_12.pth --save_dir ./data --data_split train
python tools/zero_shot_utils.py configs/faster_rcnn_r101_fpn_1x.py --classes unseen --load_from ./work_dirs/coco2014/epoch_12.pth --save_dir ./data --data_split test

4. ↔️ Training a visual-semantic mapper

Train a visual-semantic mapper using the seen data to learn a function mapping visual-space to semantic space. This trained mapper would be used in the next step while computing cyclic-consistency loss, improving feature-synthesis quality of GAN. Run:

python train_regressor.py

Weights will be saved in the appropriate paths. For VOC, run train_regressor_voc.py

5. 🏭 Train the generative model using extracted features

Extracted seen-class object features constitute the real data distribution, using which a Conditional Wasserstein GAN is trained, with class-semantics of seen/unseen classes acting as the conditional variables. During GAN training, triplet loss is computed based on the synthesized object features, enforcing inter-class dissimilarity learning. Moreover, a cyclic-consistency between the synthesized features and their class semantics is computed, encourgaing the GAN to generate visual features that correspond well to their own semantics. For training the GAN, run the script:

./script/train_coco_generator_65_15.sh

6. 🔍 Evaluation

cd mmdetection

#evaluation on zsd
./tools/dist_test.sh configs/faster_rcnn_r101_fpn_1x.py ./work_dirs/coco2014/epoch_12.pth 1 --dataset coco --out /workspace/arijit_ug/sushil/zsd/checkpoints/ab_st_final/coco_65_15_wgan_modeSeek_seen_cycSeenUnseen_tripletSeenUnseen_varMargin_try6/coco_65_15_wgan_modeSeek_seen_cycSeenUnseen_tripletSeenUnseen_varMargin_try6_zsd_result.pkl --zsd --syn_weights /workspace/arijit_ug/sushil/zsd/checkpoints/ab_st_final/coco_65_15_wgan_modeSeek_seen_cycSeenUnseen_tripletSeenUnseen_varMargin_try6/classifier_best_latest.pth

NOTE: Change --zsd flag to ---gzsd for evaluation in the generalized ZSD setting. Change directory names accordingly. The classifier weights required in the evaluation step are given for VOC and MSCOCO.

😅 Hard-coded argument: For GZSD evaluation, change the default 21 (for VOC) in this line to 81 if you want to test with MSCOCO.

For inference on a custom image, first put it inside the folder custom data. I have kept a few as examples. Obtain the model results using:

cd mmdetection
sh test_zsd_single_img.sh

It will follow Generalized ZSD for inference. Finally, to visualize the bounding boxes for detection, run:

python show_results_single_img.py

7. 🏆 Results

mAP for ZSD on MS-COCO

Method ZSD (mAP in %)

PL 12.40

BLC 14.70

ACS-ZSD 15.34

SUZOD 17.30

ZSDTR 13.20

ContrastZSD 18.60

Ours 20.10
Recall@100 for ZSD on MS-COCO

Method ZSD (Recall@100 in %)

PL 37.72

BLC 54.68

ACS-ZSD 47.83

SUZOD 61.40

ZSDTR 60.30

ContrastZSD 59.50

Ours 65.10

Method	ZSD (mAP in %)
PL	12.40
BLC	14.70
ACS-ZSD	15.34
SUZOD	17.30
ZSDTR	13.20
ContrastZSD	18.60
Ours	20.10

Method	ZSD (Recall@100 in %)
PL	37.72
BLC	54.68
ACS-ZSD	47.83
SUZOD	61.40
ZSDTR	60.30
ContrastZSD	59.50
Ours	65.10

mAP for GZSD on MS-COCO

Method	Seen (mAP in %)	Unseen (mAP in %)	Harmonic Mean (mAP in %)
PL	34.07	12.40	18.18
BLC	36.00	13.10	19.20
ACS-ZSD	-	-	-
SUZOD	37.40	17.30	23.65
ZSDTR	40.55	13.22	20.16
ContrastZSD	40.20	16.50	23.40
Ours	37.40	20.10	26.15

Recall@100 for GZSD on MS-COCO

Method	Seen (Recall@100 in %)	Unseen (Recall@100 in %)	Harmonic Mean (Recall@100 in %)
PL	36.38	37.16	36.76
BLC	56.39	51.65	53.92
ACS-ZSD	-	-	-
SUZOD	58.60	60.80	59.67
ZSDTR	69.12	59.45	61.12
ContrastZSD	62.90	58.60	60.70
Ours	58.60	64.00	61.18

Results for PASCAL-VOC

Log files are also uploaded for ZSD and GZSD.

🎁 Citation

If you use our work for your research, kindly star ⭐ our repository and consider citing our work using the following BibTex:

@inproceedings{Sarma_2022_BMVC,
author    = {Sandipan Sarma and SUSHIL KUMAR and Arijit Sur},
title     = {Resolving Semantic Confusions for Improved Zero-Shot Detection},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0347.pdf}
}

📜 References

[1] Shafin Rahman, Salman Khan, and Nick Barnes. Polarity loss for zero-shot object detection. arXiv preprint arXiv:1811.08982, 2018.

[2] Berkan Demirel, Ramazan Gokberk Cinbis, and Nazli Ikizler-Cinbis. Zero-shot object detection by hybrid region embedding. In BMVC, 2018.

[3] Yongqin Xian, Christoph H Lampert, Bernt Schiele, and Zeynep Akata. Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence, 41(9):2251–2265, 2018.

zsd-sc-resolver's People

Contributors

Stargazers

Watchers

Forkers

shuvo001 cv-det ragib-amin-nihal mowito

zsd-sc-resolver's Issues

num_classes for first step

What should num_classes be for the first step when training on seen data?

num_classes=81,

or,

num_classes=66,

Can you provide some output files？

Can you provide the output files of these two code segments, including VOC and COCO?
python tools/zero_shot_utils.py configs/faster_rcnn_r101_fpn_1x.py --classes seen --load_from ./work_dirs/coco2014/epoch_12.pth --save_dir ./data --data_split train
python tools/zero_shot_utils.py configs/faster_rcnn_r101_fpn_1x.py --classes unseen --load_from ./work_dirs/coco2014/epoch_12.pth --save_dir ./data --data_split test

Do you still have the weight file obtained by executing the fifth step

Hello sir, do you still have the weight file obtained by executing the fifth step ./script/train_coco_generator_65_15.sh? I have been unable to reproduce your results when all settings and parameters have not been changed, and the mAP difference is %14 from yours in the GZSD settings, and other parameters are also quite different from yours. I hope you can tell more details or you can share the result file.
recall mAP
ZSD 64.3 (65.1) 18.5 (20.1)
GZSD seen 63.1 (58.6) 24.1 (37.4)
unseen 30.2 (64.0) 15.6 (20.1)
The results provided in your paper are in brackets.

I have some questions about training on VOC datasets

During the training and testing processes of a generative model（The fifth step and the sixth step）. Do both the val and test sets in the VOC dataset configs file need to be set to 'seen' or unseen ?
val=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/val.txt',
img_prefix=data_root + 'VOC2007/',
classes_to_load='seen',
split=split,
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
img_prefix=data_root + 'VOC2007/',
split=split,
classes_to_load='seen',
pipeline=test_pipeline))

Training on custom data

Dear @sandipan211 ,

I have one more query. I got good results for MSCOCO data. Now, I want to train in custom data. In the training steps, for MSCOCO, MSCOCO/fasttext.npy is required, and Pascal VOC also requires /workspace/arijit_ug/sushil/zsd/VOC/fasttext_synonym.npy is required in step 4 and step 5.

Is this file created during training from steps 1-3? If so, similar .npy files for class embeddings may be created for custom data. I may then be able to train steps 4 and step 5 for complete training in custom data.

Thank you for your time and consideration.

how to generate test npy for unseen classes

Hi @sandipan211 , thanks for sharing the great work! i tired transfering the code to custom dataset, but in the generattor training step, it shows test_xxx.npy is not found, how to generate the file?

when classifier is trained

custom dataset tune

Hi @sandipan211 thanks for sharing the work and having timely reply for issues! we are recenly using the code for training a custom dataset, with about 10 seen classes and 5 unseen classes, could you please give some suggestions on tuning the hyper-pararmeters in this code, as we observe there are a lot of hyper-parameters and we don't how to tune them best for a new dataset.

Thanks!

Custom Data

What steps should I take if I want to train with custom data? Thanks in advance.

Author， Do you think you will update the code and environment to the latest version？

@sandipan211 There are many problems when I try to reproduce the environment. The code should be old. I want to know whether the author can reconstruct the code on the newer mmcv and torch

class embedding vector for custom data

Dear @sandipan211 ,

I have one query regarding class embedding. In the current repository, for MSCOCO, MSCOCO/fasttext.npy, it uses 81300 dimensional embedding , and Pascal VOC, VOC/fasttext_synonym.npy, it uses 21300 dimensional embedding. It may be because 81th in coco and 21st in voc may represent background class.
I want to create similar class embedding for custom data having 50 different number of classes. In that case, is the class embedding (for instance, in VOC/fasttext_synonym.npy) that can represent different categorical name of classes into numerical representation? Is using the python embedding function , such as word2vec , is only to represent the different classes with names in string to numerical value with 300 dimensions for each class ?

The class embedding weight (such as fasttext.npy) is required to train the regressor, specially in step 3.

Thank you for your time and consideration.

Loss suddenly becoming very large

2024-07-11 17:07:28,476 - INFO - workflow: [('train', 1)], max: 4 epochs
2024-07-11 17:08:35,687 - INFO - Epoch [1][50/2766] lr: 0.01000, eta: 4:06:41, time: 1.344, data_time: 0.074, memory: 3611, loss_rpn_cls: 0.2045, loss_rpn_bbox: 0.0280, loss_cls: 0.5364, acc: 94.1191, loss_bbox: 0.0785, loss: 0.8474
2024-07-11 17:09:35,933 - INFO - Epoch [1][100/2766] lr: 0.01000, eta: 3:52:52, time: 1.205, data_time: 0.002, memory: 3611, loss_rpn_cls: 0.2257, loss_rpn_bbox: 0.1235, loss_cls: 1.8755, acc: 90.9961, loss_bbox: 0.2189, loss: 2.4435
2024-07-11 17:10:35,892 - INFO - Epoch [1][150/2766] lr: 0.01000, eta: 3:47:14, time: 1.199, data_time: 0.002, memory: 3611, loss_rpn_cls: 298.0166, loss_rpn_bbox: 359.4627, loss_cls: 36350.0296, acc: 91.0960, loss_bbox: 26111.0423, loss: 63118.5509
2024-07-11 17:11:34,803 - INFO - Epoch [1][200/2766] lr: 0.01000, eta: 3:42:59, time: 1.178, data_time: 0.002, memory: 3611, loss_rpn_cls: 317872648288.5507, loss_rpn_bbox: 719705513238.4994, loss_cls: 5641620281320.8613, acc: 72.2567, loss_bbox: 4112043418927.8037, loss: 10791241863295.7031
2024-07-11 17:12:31,443 - INFO - Epoch [1][250/2766] lr: 0.01000, eta: 3:38:24, time: 1.133, data_time: 0.002, memory: 3611, loss_rpn_cls: 1342402719373821617373184.0000, loss_rpn_bbox: 445532799740254198169600.0000, loss_cls: 20808144155606727859896320.0000, acc: 86.6701, loss_bbox: 39000440132089944379228160.0000, loss: 61596520540968801825456128.0000
2024-07-11 17:13:27,763 - INFO - Epoch [1][300/2766] lr: 0.01000, eta: 3:34:49, time: 1.126, data_time: 0.002, memory: 3611, loss_rpn_cls: 47490214674480742442991616.0000, loss_rpn_bbox: 14334860135127650800762880.0000, loss_cls: 612401653169420411301003264.0000, acc: 90.2228, loss_bbox: 1563048999675987437012647936.0000, loss: 2237275709971362084672241664.0000
2024-07-11 17:14:26,721 - INFO - Epoch [1][350/2766] lr: 0.01000, eta: 3:33:22, time: 1.179, data_time: 0.002, memory: 3611, loss_rpn_cls: 61938456058247602409308160.0000, loss_rpn_bbox: 17071254521314804517306368.0000, loss_cls: 726978391671261779812417536.0000, acc: 87.8476, loss_bbox: 1770044873227421734154534912.0000, loss: 2576032968830212433080483840.0000

inference to single image

Hi, Thank you for your work.

I would like to inquire about the possibility of adding inference code for a custom single image.
I am interested in conducting a qualitative evaluation of your work.

Do you have a minimal environment YAML file?

I am unable to recreate your environment using your zsd_environment.yml file, there are various dependency errors I'm getting.

The environment file looks like an export of your current environment, and so there are probably lots of packages, e.g. Anaconda default installs, that aren't needed, and so cleaning up the environment file might help resolve the issue.

In other words, do you have an environment file that lists just the absolute minimum set of packages that are needed?

How to generate class embedding files?

Hi, thanks for you great job.
I just get confused how do you generate these classes embedding files(fastext, glove).
How does the index in classes embedding files match to the class id?
Could provide a little more details about generating class embedding files?
Thanks!

Pascal VOC data split

Hi @sandipan211, thanks for sharing this work! I was recently trying to reproduce your results on PASCAL VOC but couldn't achieve such good experimental results. I would like to know how you divided the PASCAL 2007 and 2012 datasets to make sure I didn't make a mistake in this step. Thanks for your time and consideration.

Setting batch size

Is there an option to set batch size? I'm attempting to run an example on a single GPU and encountering memory issues.

Do you think you'll ever migrate your code to the latest stable versions of mmcv and mmdetection?

I closed out my other issue because I couldn't get the repo to work because of the old code, so was wondering/hoping if you plan to bring your code up to date with the latest mmcv and mmdetection packages

NaN metrics during Epoch 1 on coco2014

Is this normal behavior during during Epoch 1 when training the backbone on coco2014 dataset?

2022-12-20 14:32:29,144 - INFO - Epoch [1][25500/61598] lr: 0.02000, eta: 5 days, 12:43:05, time: 0.658, data_time: 0.009, memory: 3162, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 0.0000, loss_bbox: nan, loss: nan

low accuracy on unseen classes

Dear author,

I trained and got good results for Coco. Setting split = unseen, test accuracy for both seen and unseen classes can be obtained with gzsd. However, in Pascal VOC config, for the test set, should we train separately to get test accuracy on seen and unseen classes? Because, while I trained making split = seen, it only shows seen class accuracy during testing and also only for unseen class (in both zsd and gzsd), while trying for split = unseen.
Moreover, while trying on custom data, mAP for seen class is very hi in the range of 60-70. During training GAN(step 4), validation accuracy is higher (around 25.00), but test accuracy is very low (around 5.00).
Can you please suggest a solution for higher accuracy on seen class and lower accuracy on the unseen class for the test (although validation accuracy is a little higher for the unseen class)?

Thank you for your time and consideration.

Some hesitations related to background class label index

Hi, Thanks for your work. I met similar problem about environment nbkn865 said as #7, but I haven't try your further response. Now I'm trying to transfer this work to new mmdet2.x. However, mmdet2.x have some different characteristics, one is that the background label is not 0 but the num_classes.

I wonder if your work still efficient after the background label changed.?

Also, I wonder if these semantic vectors start from the "background", that is to say the index of "background" vector(attribute) is 0?

And in mmdetection/tools/test.py, I found code as

 model.bbox_head.seen_bg_weight = torch.from_numpy(seen_bg_weight).cuda()
 model.bbox_head.seen_bg_bias = torch.from_numpy(seen_bg_bias).cuda()

but I think the model has no attr as seen_bg_weight and see_bg_bias. Do you mean it refers to 0 index (last index in new mmdet) of fc layer weights or do you modify something?

Looking forward to your answer, thanks!

The test result of detector trained only on seen classes

Hi, I ran a test of the epoch_12 you provided in the readme.
The result came out as follow:
num_classes ----------- 80
+----------------+-------+--------+--------+-----------+----------+
| class | gts | dets | recall | precision | ap |
+----------------+-------+--------+--------+-----------+----------+
| person | 15697 | 198482 | 0.895 | 0.071 | 0.660139 |
| bicycle | 290 | 4338 | 0.628 | 0.042 | 0.294971 |
| car | 2392 | 47918 | 0.788 | 0.039 | 0.192860 |
| motorcycle | 118 | 2004 | 0.720 | 0.042 | 0.510870 |
| bus | 208 | 7563 | 0.707 | 0.019 | 0.196079 |
| truck | 779 | 11144 | 0.416 | 0.029 | 0.039854 |
| boat | 211 | 17642 | 0.616 | 0.007 | 0.174057 |
| traffic_light | 563 | 18209 | 0.650 | 0.020 | 0.173152 |
| fire_hydrant | 52 | 936 | 0.827 | 0.046 | 0.546072 |
| stop_sign | 47 | 1365 | 0.745 | 0.026 | 0.521156 |
| bench | 569 | 37449 | 0.489 | 0.007 | 0.049016 |
| bird | 235 | 31663 | 0.523 | 0.004 | 0.124819 |
| dog | 437 | 8283 | 0.787 | 0.042 | 0.178770 |
| horse | 35 | 621 | 0.629 | 0.035 | 0.453032 |
| sheep | 29 | 8115 | 0.310 | 0.001 | 0.022340 |
| cow | 37 | 1310 | 0.595 | 0.017 | 0.140441 |
| elephant | 6 | 288 | 0.500 | 0.010 | 0.500000 |
| giraffe | 7 | 377 | 0.857 | 0.016 | 0.563532 |
| backpack | 955 | 8777 | 0.466 | 0.051 | 0.054695 |
| umbrella | 320 | 6690 | 0.672 | 0.032 | 0.225762 |
| handbag | 1112 | 13397 | 0.424 | 0.035 | 0.082742 |
| tie | 160 | 7823 | 0.613 | 0.013 | 0.102578 |
| skis | 468 | 20539 | 0.560 | 0.013 | 0.044583 |
| sports_ball | 77 | 12894 | 0.545 | 0.003 | 0.035616 |
| kite | 31 | 11299 | 0.742 | 0.002 | 0.248010 |
| baseball_bat | 7 | 1684 | 0.857 | 0.004 | 0.139171 |
| baseball_glove | 3 | 1530 | 0.000 | 0.000 | 0.000000 |
| skateboard | 24 | 3004 | 0.833 | 0.007 | 0.605755 |
| surfboard | 14 | 8140 | 0.357 | 0.001 | 0.011808 |
| tennis_racket | 11 | 1705 | 0.455 | 0.003 | 0.200000 |
| bottle | 2375 | 21796 | 0.742 | 0.081 | 0.392130 |
| wine_glass | 820 | 3996 | 0.687 | 0.141 | 0.467318 |
| cup | 2730 | 14335 | 0.690 | 0.131 | 0.401259 |
| knife | 1180 | 11138 | 0.402 | 0.043 | 0.052995 |
| spoon | 999 | 16973 | 0.416 | 0.025 | 0.049059 |
| bowl | 1643 | 10953 | 0.733 | 0.110 | 0.374713 |
| banana | 183 | 7767 | 0.634 | 0.015 | 0.101670 |
| apple | 168 | 2932 | 0.429 | 0.025 | 0.085470 |
| orange | 171 | 2765 | 0.702 | 0.043 | 0.123477 |
| broccoli | 446 | 18614 | 0.749 | 0.018 | 0.117123 |
| carrot | 727 | 20541 | 0.707 | 0.025 | 0.060945 |
| pizza | 599 | 5304 | 0.755 | 0.085 | 0.421441 |
| donut | 197 | 7644 | 0.594 | 0.015 | 0.143520 |
| cake | 590 | 5982 | 0.629 | 0.062 | 0.243563 |
| chair | 2788 | 70764 | 0.655 | 0.026 | 0.197088 |
| couch | 340 | 2819 | 0.494 | 0.060 | 0.211933 |
| potted_plant | 609 | 25395 | 0.800 | 0.019 | 0.261423 |
| bed | 311 | 5294 | 0.707 | 0.042 | 0.216452 |
| dining_table | 1849 | 99306 | 0.787 | 0.015 | 0.192627 |
| tv | 934 | 10099 | 0.760 | 0.070 | 0.426263 |
| laptop | 775 | 3485 | 0.796 | 0.177 | 0.631668 |
| remote | 237 | 8653 | 0.608 | 0.017 | 0.129868 |
| keyboard | 784 | 6538 | 0.778 | 0.093 | 0.297669 |
| cell_phone | 448 | 7005 | 0.592 | 0.038 | 0.163251 |
| microwave | 83 | 624 | 0.687 | 0.091 | 0.593158 |
| oven | 144 | 6232 | 0.688 | 0.016 | 0.286438 |
| sink | 856 | 33101 | 0.822 | 0.021 | 0.255440 |
| refrigerator | 97 | 15373 | 0.732 | 0.005 | 0.300810 |
| book | 2632 | 51639 | 0.697 | 0.036 | 0.083469 |
| clock | 172 | 8461 | 0.680 | 0.014 | 0.196000 |
| vase | 249 | 3609 | 0.546 | 0.038 | 0.185588 |
| scissors | 52 | 2071 | 0.404 | 0.010 | 0.115084 |
| teddy_bear | 130 | 2989 | 0.838 | 0.036 | 0.415706 |
| toothbrush | 137 | 0 | 0.000 | 0.000 | 0.000000 |
+----------------+-------+--------+--------+-----------+----------+
| mean | | | 0.627 | | 0.238852 |
+----------------+-------+--------+--------+-----------+----------+
+---------------+------+------+--------+-----------+----------+
| class | gts | dets | recall | precision | ap |
+---------------+------+------+--------+-----------+----------+
| airplane | 1444 | 0 | 0.000 | 0.000 | 0.000000 |
| train | 1602 | 0 | 0.000 | 0.000 | 0.000000 |
| parking_meter | 510 | 0 | 0.000 | 0.000 | 0.000000 |
| cat | 1669 | 0 | 0.000 | 0.000 | 0.000000 |
| bear | 462 | 0 | 0.000 | 0.000 | 0.000000 |
| suitcase | 2219 | 0 | 0.000 | 0.000 | 0.000000 |
| frisbee | 935 | 0 | 0.000 | 0.000 | 0.000000 |
| snowboard | 793 | 0 | 0.000 | 0.000 | 0.000000 |
| fork | 1775 | 0 | 0.000 | 0.000 | 0.000000 |
| sandwich | 1457 | 0 | 0.000 | 0.000 | 0.000000 |
| hot_dog | 1009 | 0 | 0.000 | 0.000 | 0.000000 |
| toilet | 1462 | 0 | 0.000 | 0.000 | 0.000000 |
| mouse | 850 | 0 | 0.000 | 0.000 | 0.000000 |
| toaster | 78 | 0 | 0.000 | 0.000 | 0.000000 |
| hair_drier | 74 | 0 | 0.000 | 0.000 | 0.000000 |
+---------------+------+------+--------+-----------+----------+
| mean | | | 0.000 | | 0.000000 |
+---------------+------+------+--------+-----------+----------+
+------+--+--+-------+--+----------+
| mean | | | 0.627 | | 0.238852 |
+------+--+--+-------+--+----------+
| mean | | | 0.000 | | 0.000000 |
+------+--+--+-------+--+----------+
mAP is : 0.19349995255470276

The detected bounding boxes of "toothbrush" is zero, which looks not right. I think I might do something wrong.
Is the result the same as yours?

Confusion about validation metrics for zero-shot detection.

Hi, I am now a bit confused about validation metrics for zero-shot detection.

I knew zsd is that just load unseen classes whlie test, and we just setting classes_to_load as unseen is ok. And I repeat the result as you reported on coco and voc dataset.

How do I test gzsd setting? Does gzsd means load all classes while test(classes_to_load = all)? In mmdetection/tools/test.py, this code may set for gzsd test on coco.

 if cfg.test_cfg.rcnn.gzsd and hasattr(dataset,'cat_ids'):
        dataset.cat_to_load = dataset.cat_ids

When I test gzsd (use --gzsd) on voc, I can't get gzsd result. When I test gzsd on coco dataset, I get a worse result, which is different to your report.
So, for gzsd, how to load the correct test data on both voc and coco dataset?

Hope you can answer my confusion, thank you very much！

is train_unseen_classifier.py missing

@sandipan211 Hi, thanks for sharing the code, but train_unseen_classifier.py is not found, is it missing ?

sandipan211 / zsd-sc-resolver Goto Github PK

zsd-sc-resolver's Introduction

Resolving Semantic Confusions for Improved Zero-Shot Detection (BMVC Oral Presentation, 2022)

👓 At a glance

📰 News

📹 Video

🚄 Training the model

1. 🏢 Creating the work environment

2. ⏳ Train Faster-RCNN detector on seen data

3. 📤 Extract object features

4. ↔️ Training a visual-semantic mapper

5. 🏭 Train the generative model using extracted features

6. 🔍 Evaluation

7. 🏆 Results

🎁 Citation

📜 References

zsd-sc-resolver's People

Contributors

Stargazers

Watchers

Forkers

zsd-sc-resolver's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs