'Taking a Deeper Look at Co-Salient Object Detection'
Co-salient object detection (CoSOD) is a newly emerging and rapidly growing branch of salient object detection (SOD), which aims to detect the co-occurring salient objects in multiple images. However, existing CoSOD datasets often have a serious data bias, which assumes that each group of images contains salient objects of similar visual appearances. This bias results in the ideal settings and the effectiveness of the models, trained on existing datasets, may be impaired in real-life situations, where the similarity is usually semantic or conceptual. To tackle this issue, we first collect a new high-quality dataset, named CoSOD3k, which contains 3,316 images divided in 160 groups with multiple level annotations, i.e., category, bounding box, object, and instance levels. CoSOD3k makes a significant leap in terms of diversity, difficulty and scalability, benefiting related vision tasks. Besides, we comprehensively summarize 34 cutting-edge algorithms, benchmarking 19 of them over four existing CoSOD datasets (MSRC, iCoSeg, Image Pair and CoSal2015) and our CoSOD3k with a total of ∼61K images (largest scale), and reporting group-level performance analysis. Finally, we discuss the challenge and future work of CoSOD. Our study would give a strong boost to growth in the CoSOD community.
Figure 1: Different salient object detection (SOD) tasks. (a) Traditional SOD [78]. (b) Within image co-salient object detection (CoSOD) [93], where common salient objects are detected from a single image. (c) Existing CoSOD, where salient objects are detected according to a pair [52] or a group [85] of images with similar appearances. (d) The proposed CoSOD in the wild, which requires a large amount of semantic context, making it more challenging than existing CoSOD.
Figure 2: The 160 Objects from our CoSOD3k.
Table 1: Statistics for size and number of instances/objects in existing datasets.’-’ indicates that the dataset only contains object-level annotations, so, the number of instances is only one.
Year | Publisher | Paper | #Image | Download Link1 | Download Link2 |
---|---|---|---|---|---|
2005 | ICCV | MSRC | 233 | Baidu Pan: 8r27 | Google (4.17M) |
2010 | CVPR | iCoSeg | 643 | Baidu Pan: e1mz | Google (67M) |
2011 | TIP | Image Pair | 105 | Baidu Pan: fmqj | Google (0.98M) |
2016 | IJCV/CVPR | CoSal2015 | 2015 | Baidu Pan: kpvv | Google (96.1M) |
2018 | AAAI | WICOS | 364 | Baidu Pan: b5qg | Google (10.7M) |
2020 | ECCV | CoCA | 1295 | Baidu Pan: ckzt | Google (96M) |
2020 | CVPR | CoSOD3k | 3316 | Baidu Pan: 65as | Google (418M) + Google (411M) |
Overall | Baidu Pan: 6mvn | Google (1.4G) |
Model | Pub. | Year | #Training | Training set | Main Component | SL. | Sp. | Po. | Ed. | Post. |
---|---|---|---|---|---|---|---|---|---|---|
WPLT | UIST | 2010 | Morphological, Translational Alignment | U | ||||||
PCSDT | ICIP | 2010 | 120,000 | 8*8 image patch | Sparse feature, Filter Bank | W | ||||
IPCST | TIP | 2011 | Ncut, co-multilayer Graph | U | √ | |||||
CBCST | TIP | 2013 | Contrast/Spatial/Corresponding Cue | U | ||||||
MIT | TMM | 2013 | Feature/Images Pyramid, Multi-scale Voting | U | √ | GCut | ||||
CSHST | SPL | 2013 | Hierarchical Segmentation, Contour Map | U | √ | |||||
ESMGT | SPL | 2014 | Efficient Manifold Ranking 184], OTSU | U | ||||||
BRT | MM | 2014 | Common/Center Cue, Global Correspondence | U | √ | |||||
SACST | TIP | 2014 | Self-adaptive Weight, Low Rank Matrix | U | √ | |||||
DIM | TNNLS | 2015 | 1,000+9,963 | ASD+PV | SDAE model, Contrast/Object Prior | S | √ | |||
CODW | IJCV | 2016 | ImageNet pre-train | SermaNet, RBM, IMC, IGS, IGC | W | √ | √ | |||
SP-MIL | TPAMI | 2017 | (240+643)•10% | MSRC-V1+iCoseg | SPL 1971, SVM, GIST 1691, CNNs | W | √ | |||
GD | IJCAI | 2017 | 9,213 | MSCOCO | VGGNet16 [681, Group-wise Feature | S | ||||
MVSRCC | TIP | 2017 | LBP, SIFT [611, CH, Bipartite Graph | √ | √ | |||||
UMLF | TCSVT | 2017 | (240+2015)*50% | MSRC-V1 + CoSa12015 | SVM, OMR 186], metric teaming | S | √ | |||
DML | BMVC | 2018 | 10,000+6,232+ 5,168 | MIOK+THUR-15K 1111 +DO | CAE, HSR, Multistage | S | ||||
DWSI | AAAI | 2018 | EdgeBox [106], Low-rank Matrix, CH | S | √ | |||||
GONet | ECCV | 2018 | ImageNet pre-train | ResNet-50 [281, Graphical Optimization | W | √ | CRF | |||
COC | IJCAI | 2018 | ImageNet pre-train | ResNet-50 [281, Co-attention Loss | W | √ | CRF | |||
FASS | MM | 2018 | ImageNet pre-train | DHS 156]/VGGNet. Graph optimization | W | √ | ||||
PJOT | TIP | 2018 | Energy Minimization, BoWs | U | √ | |||||
SPIG | TIP | 2018 | 10,000+210+2,015+240 | MIOK+IPCS+CoSal2015+ MSRC-V | DeepLab, Graph Representation | S | √ | |||
QGF | TMM | 2018 | ImageNet pre-traln | Dense Correspondence, Quality Measure | S | √ | THR | |||
EHL | NC | 2019 | 643 | iCoseg | GoogLeNet, FSM | S | √ | |||
IML | NC | 2019 | 3624 | CoSa12015+PV+CR | VGGNet16 | S | √ | |||
DGFC | TIP | 2019 | >200,000 | MSCOCO 1551 | VGGNet16, Group-wise Feature | S | √ | |||
RCANet | IJCAI | 2019 | >200,000 | MSCOCO+COS+iCoseg+ CoSa12015+MSRC | VGGNet16, Recurrent Units | S | THR | |||
GS | AAAI | 2019 | 200,000 | COCO-SEG | VGGNet19, Co-category Classification | S | ||||
MGCNet | ICME | 2019 | Graph Convolutional Networks | S | √ | |||||
MGLCN | MM | 2019 | N/A | N/A | VGGNet16, PiCANet, Inter-/Intra-graph | S | √ | |||
HC | MM | 2019 | N/A | N/A | VAE-Net, Hierarchical Consistency | S | √ | √ | CRF | |
CSMG | CVPR | 2019 | 25,00 | MB | VGGNet16, Shared Superpixel Feature | S | √ | |||
DeepCO3 | CVPR | 2019 | 10,000 | MIOK | SVFSaI / VGGNet, Co-peak Search | W | √ | |||
GWD | ICCV | 2019 | >200,000 | MSCOCO | VGGNet19 , RNN, Group-wise Loss | S | THR | |||
GCAGC | CVPR | 2020 | >200,000 | COCO-SEG | Graph Model | S | ||||
GICD | ECCV | 2020 | 8,250 | DUTS_class | Gradient | S | ||||
ICNet | NeurIPS | 2020 | 9,213 | COCO-9k | External SOD Model | S | ||||
CoADNet | NeurIPS | 2020 | >200,000 | DUTS_class+COCO-SEG | Group Mining | S | ||||
CoSformer | arXiv | 2021 | >200,000 | DUTS_class+COCO-SEG | Transformer | S | ||||
CoEG-Net | TPAMI | 2021 | 8,250 | DUTS_class | PCA | S | ||||
DeepACG | CVPR | 2021 | >200,000 | COCO-SEG | Gromov-Wasserstein Distance | S | ||||
GCoNet | CVPR | 2021 | 8,250 | DUTS_class | Group Collaborative Learning | S | ||||
CADC | ICCV | 2021 | 8,250+9,213 | DUTS_class+COCO-9k | Dynamic Convolution | S | ||||
DCFM | CVPR | 2022 | 9,213 | COCO-9k | Prototype, self-contrastive learning | S | ||||
UFO | arXiv | 2022 | >200,000 | COCO-SEG | Transformer | S | ||||
GCoNet+ | arXiv | 2022 | >200,000 | DUTS_class, COCO-9k, COCO-SEG | Inter-group Learning, Metric Learning | S |
WPLT means the WPL is a traditional method, instead of a deep method.
Refer to the CoSOD task in papers-with-code.
Model | Baidu Pan | Google Drive |
---|---|---|
CBCS | Baidu-Disk (gtse) | Google-Drive |
CODR | Baidu-Disk (qfks) | Google-Drive |
CPD | Baidu-Disk (jxkk) | Google-Drive |
CSHS | Baidu-Disk (wda4) | Google-Drive |
CSMG | Baidu-Disk (gwm6) | Google-Drive |
DIM | Baidu-Disk (2hgk) | Google-Drive |
EGNet | Baidu-Disk (tkna) | Google-Drive |
ESMG | Baidu-Disk (hxqb) | Google-Drive |
IML | Baidu-Disk (7m1c) | Google-Drive |
UMLF | Baidu-Disk (eqpw) | Google-Drive |
GCAGC | Baidu-Disk (ij29) | Google-Drive |
GICD | Baidu-Disk (puji) | Google-Drive |
ICNet | Baidu-Disk (xwcv) | Google-Drive |
CoADNet | Baidu-Disk (MVPL) | |
Co-EGNet | Baidu-Disk (f4p3) | Google-Drive |
GCoNet | Google-Drive | |
CADC | Baidu-Disk (i59u) | Google-Drive |
DCFM | Google-Drive | |
UFO | Google-Drive | |
GCoNet+ | Google-Drive |
Figure 3: Qualitative examples of existing top-10 models on CoSOD3k.
If you find this useful, please cite the following work:
@inproceedings{fan2020taking,
title={Taking a Deeper Look at the Co-salient Object Detection},
author={Fan, Deng-Ping and Lin, Zheng and Ji, Ge-Peng and Zhang, Dingwen and Fu, Huazhu and Cheng, Ming-Ming},
booktitle={IEEE CVPR},
year={2020}
}
@article{fan2022re,
title={Re-thinking co-salient object detection},
author={Fan, Deng-Ping and Li, Tengpeng and Lin, Zheng and Ji, Ge-Peng and Zhang, Dingwen and Cheng, Ming-Ming and Fu, Huazhu and Shen, Jianbing},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
volume={44},
number={8},
pages={4339-4354},
year={2022},
publisher={IEEE}
}