elias-ramzi / roadmap Goto Github PK

This repository contains the official implementation of the NeurIPS'21 paper, ROADMAP: Robust and Decomposable Average Precision for Image Retrieval.

Home Page: https://arxiv.org/abs/2110.01445

License: MIT License

Python 100.00%

image-retrieval average-precision neurips-2021 metric-learning

roadmap's People

Stargazers

Watchers

Forkers

echochoc wovai ii-research-yu lrain-cn arseneamoya di0002ya ericustc koryakovdmitry youngjaean

roadmap's Issues

Train error with SOP

Hi, I ran the training in SOP, model is resnet, when it computing accuracy for the test split w.r.t ['test'], I got this error:

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:162; details: cublas failed (13): (512, 512) x (60502, 512)' = (512, 60502)
Aborted

RuntimeError: derivative for heaviside is not implemented

Hello elias-ramzi, thanks for your brilliant work.
I run the sop code below, but found error: 'RuntimeError: derivative for heaviside is not implemented'
CUDA_VISIBLE_DEVICES=0 python roadmap/single_experiment_runner.py \ 'experience.experiment_name=sop_ROADMAP_${dataset.sampler.kwargs.batch_size}_sota' \ experience.seed=333 \ experience.max_iter=100 \ 'experience.log_dir=${env:HOME}/experiments/ROADMAP' \ optimizer=sop \ model=resnet \ transform=sop_big \ dataset=sop \ dataset.sampler.kwargs.batch_size=128 \ dataset.sampler.kwargs.batches_per_super_pair=10 \ loss=roadmap

My torch version is 1.9.1. I found a nondifferentiable operation torch.heaviside in SupAP class (step_rank function). Is this the reason?

Evaluate error

thanks for your nice work!
I tried to reproduce the results on iNaturalist and SOP with a 32G V100. But I met the same error in the evaluation phase.

The error information is as follows：
[2023-03-31 09:27:57,441][PML][INFO] - Computing accuracy for the test split w.r.t ['test']
Traceback (most recent call last):
File "roadmap/single_experiment_runner.py", line 66, in single_experiment_runner
checkpoint_dir=resume,
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/run.py", line 154, in run
restore_epoch=restore_epoch,
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/engine/train.py", line 100, in train
**dataset_dict,
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/utils/get_set_random_state.py", line 32, in wrapper
output = func(*args, **kwargs)
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/engine/evaluate.py", line 145, in evaluate
splits_to_eval=splits_to_eval,
File "/home/yangjian/anaconda3/envs/roadmap/lib/python3.7/site-packages/pytorch_metric_learning/testers/base_tester.py", line 307, in test
reference_split_names,
File "/home/yangjian/anaconda3/envs/roadmap/lib/python3.7/site-packages/pytorch_metric_learning/testers/global_embedding_space.py", line 26, in do_knn_and_accuracies
self.ref_includes_query(query_split_name, reference_split_names),
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/engine/accuracy_calculator.py", line 140, in get_accuracy
query_labels, reference_labels, self.label_comparison_fn,
File "/home/yangjian/anaconda3/envs/roadmap/lib/python3.7/site-packages/pytorch_metric_learning/utils/accuracy_calculator.py", line 153, in get_label_match_counts
comparison = unique_query_labels[:, None] == reference_labels
RuntimeError: The size of tensor a (512) must match the size of tensor b (60502) at non-singleton dimension 2

I have printed the 'query_labels' and 'reference_labels':
query_labels: tensor([[-0.0357, -0.0059, -0.0690, ..., -0.0238, -0.0273, 0.0572],
[-0.0684, 0.0068, -0.0903, ..., -0.0357, 0.0095, 0.0314],
[-0.0661, -0.0320, -0.0592, ..., 0.0084, -0.0471, 0.0413],
...,
[-0.0183, -0.0594, -0.0730, ..., 0.0730, 0.0116, 0.0096],
[ 0.0094, -0.0630, -0.0917, ..., 0.1146, -0.0042, 0.0074],
[ 0.0019, -0.0335, -0.0428, ..., 0.0115, -0.0445, 0.0971]],
device='cuda:0') torch.Size([136093, 512])
reference_labels: tensor([716, 716, 716, ..., 810, 810, 810], device='cuda:0') torch.Size([136093])

Could you please help us find out the possible reasons?

DG_{AP} in Eq. 8

Is there any guarantee that DG_{AP} is non-negative?

For example, we have two batches:

Batch 1: red < red < green < green < green
Batch 2: red < green < green < green < red

The AP of Batch 1 is 1.0, and the AP of Batch 2 is 0.64. The average AP over such two batches is 0.82.

If these two batches form a dataset like this: red < red < red < green < green < green < red < green < green < green,
then the AP over the dataset is 0.915, and DG_{AP} here is -0.095, right?

Could you help me understand this part? Please let me know if I misunderstood something. Thanks. @elias-ramzi

XBM Setting -> Potential lead to out of memory

Dear author,

I notice that size (xbm setting) for sop dataset is set to dataset size. Will this setting result in out of memory?

typo in equation 8

😉nice paper, but I think that the summation in equation 8 should be $\sum_{j=1}^{|P_i^b|}$ instead of $\sum_{j=1}^{B}$.

Train errors on CUB

After configuring the runtime environment according to the guide, I ran the training code with the following error： Traceback (most recent call last):
File "roadmap/single_experiment_runner.py", line 7, in
import roadmap.run as run
File "/code/yx/SIGIR/ROADMAP/roadmap/run.py", line 10, in
from ray import tune
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/init.py", line 2, in
from ray.tune.tune import run_experiments, run
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/tune.py", line 14, in
from ray.tune.utils.callback import create_default_callbacks
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/utils/callback.py", line 7, in
from ray.tune.progress_reporter import TrialProgressCallback
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/progress_reporter.py", line 28, in
raise ImportError("ray.tune in ray > 0.7.5 requires 'tabulate'. "
ImportError: ray.tune in ray > 0.7.5 requires 'tabulate'. Please re-run 'pip install ray[tune]' or 'pip install ray[rllib]'.
`

Expressions in code and paper do not match

Hello. Thank you for your paper

I read and check the your paper and code. i found the not match Equation (4) and step_rank code in smooth_rank_ap.py line i51

you are write equaton about
but i don`t found (t-delta) in code

so why don`t made (t-delta) and sigmoid ?? cod write just ''' tens[~target & pos_mask & margin_mask] = rho * tens[~target & pos_mask & margin_mask] + offset'''

i don`t understand. Thanks for the explanation

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float;

Evaluation issues regarding memory overflow on GPU

Hi,

After having trained the model with SOP dataset, I tried to use the ckpt to run evaluate.py. The cmd i used is : python evaluate.py --config /home/zhijue/experiments/ROADMAP/sop_ROADMAP_128_sota/weights/epoch_100.ckpt --bs 128.
The result is like this:

09/08/2022 02:16:27 PM - INFO - running k-nn with k=2048
09/08/2022 02:16:27 PM - INFO - embedding dimensionality is 512
Traceback (most recent call last):
File "evaluate.py", line 95, in
metrics = load_and_evaluate(
File "evaluate.py", line 51, in load_and_evaluate
metrics = eng.evaluate(
File "/home/zhijue/project/models/ROADMAP/roadmap/utils/get_set_random_state.py", line 32, in wrapper
output = func(*args, *kwargs)
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/evaluate.py", line 141, in evaluate
return tester.test(
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/pytorch_metric_learning/testers/base_tester.py", line 306, in test
self.do_knn_and_accuracies(
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/pytorch_metric_learning/testers/global_embedding_space.py", line 21, in do_knn_and_accuracies
a = self.accuracy_calculator.get_accuracy(
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/accuracy_calculator.py", line 155, in get_accuracy
knn_indices, knn_distances = get_knn(
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/get_knn.py", line 16, in get_knn
distances, indices = get_knn_faiss(references, queries, num_k)
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/get_knn.py", line 47, in get_knn_faiss
index.add(references)
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/faiss/contrib/torch_utils.py", line 96, in torch_replacement_add
return self.add_numpy(x)
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/faiss/init.py", line 104, in replacement_add
self.add_c(n, swig_ptr(x))
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/faiss/swigfaiss.py", line 4313, in add
return _swigfaiss.IndexReplicas_add(self, n, x)
RuntimeError: Error in virtual void faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/StandardGpuResources.cpp:410: Error: 'err == cudaSuccess' failed: Failed to cudaMalloc 123908096 bytes on device 0 (error 2 out of memory
Outstanding allocations:
Alloc type TemporaryMemoryBuffer: 1 allocations, 1610612736 bytes

It seems like the model only let me to do evaluation on single GPU. Is it possible for it to perform on multiple GPU?

My GPU config is:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| 0 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 22% 35C P8 5W / 250W | 344MiB / 11264MiB | 0% Default |

elias-ramzi / roadmap Goto Github PK

roadmap's People

Stargazers

Watchers

Forkers

roadmap's Issues

Train error with SOP

RuntimeError: derivative for heaviside is not implemented

Evaluate error

DG_{AP} in Eq. 8

XBM Setting -> Potential lead to out of memory

typo in equation 8

Train errors on CUB

Expressions in code and paper do not match

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float;

Evaluation issues regarding memory overflow on GPU

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs