GithubHelp home page GithubHelp logo

elias-ramzi / roadmap Goto Github PK

View Code? Open in Web Editor NEW
71.0 4.0 9.0 594 KB

This repository contains the official implementation of the NeurIPS'21 paper, ROADMAP: Robust and Decomposable Average Precision for Image Retrieval.

Home Page: https://arxiv.org/abs/2110.01445

License: MIT License

Python 100.00%
image-retrieval average-precision neurips-2021 metric-learning

roadmap's Issues

Train errors on CUB

After configuring the runtime environment according to the guide, I ran the training code with the following error: Traceback (most recent call last):
File "roadmap/single_experiment_runner.py", line 7, in
import roadmap.run as run
File "/code/yx/SIGIR/ROADMAP/roadmap/run.py", line 10, in
from ray import tune
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/init.py", line 2, in
from ray.tune.tune import run_experiments, run
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/tune.py", line 14, in
from ray.tune.utils.callback import create_default_callbacks
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/utils/callback.py", line 7, in
from ray.tune.progress_reporter import TrialProgressCallback
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/progress_reporter.py", line 28, in
raise ImportError("ray.tune in ray > 0.7.5 requires 'tabulate'. "
ImportError: ray.tune in ray > 0.7.5 requires 'tabulate'. Please re-run 'pip install ray[tune]' or 'pip install ray[rllib]'.
`

DG_{AP} in Eq. 8

Is there any guarantee that DG_{AP} is non-negative?

For example, we have two batches:

  • Batch 1: red < red < green < green < green
  • Batch 2: red < green < green < green < red

The AP of Batch 1 is 1.0, and the AP of Batch 2 is 0.64. The average AP over such two batches is 0.82.

If these two batches form a dataset like this: red < red < red < green < green < green < red < green < green < green,
then the AP over the dataset is 0.915, and DG_{AP} here is -0.095, right?

Could you help me understand this part? Please let me know if I misunderstood something. Thanks. @elias-ramzi

Evaluate error

thanks for your nice work!
I tried to reproduce the results on iNaturalist and SOP with a 32G V100. But I met the same error in the evaluation phase.

The error information is as follows:
[2023-03-31 09:27:57,441][PML][INFO] - Computing accuracy for the test split w.r.t ['test']
Traceback (most recent call last):
File "roadmap/single_experiment_runner.py", line 66, in single_experiment_runner
checkpoint_dir=resume,
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/run.py", line 154, in run
restore_epoch=restore_epoch,
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/engine/train.py", line 100, in train
**dataset_dict,
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/utils/get_set_random_state.py", line 32, in wrapper
output = func(*args, **kwargs)
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/engine/evaluate.py", line 145, in evaluate
splits_to_eval=splits_to_eval,
File "/home/yangjian/anaconda3/envs/roadmap/lib/python3.7/site-packages/pytorch_metric_learning/testers/base_tester.py", line 307, in test
reference_split_names,
File "/home/yangjian/anaconda3/envs/roadmap/lib/python3.7/site-packages/pytorch_metric_learning/testers/global_embedding_space.py", line 26, in do_knn_and_accuracies
self.ref_includes_query(query_split_name, reference_split_names),
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/engine/accuracy_calculator.py", line 140, in get_accuracy
query_labels, reference_labels, self.label_comparison_fn,
File "/home/yangjian/anaconda3/envs/roadmap/lib/python3.7/site-packages/pytorch_metric_learning/utils/accuracy_calculator.py", line 153, in get_label_match_counts
comparison = unique_query_labels[:, None] == reference_labels
RuntimeError: The size of tensor a (512) must match the size of tensor b (60502) at non-singleton dimension 2

I have printed the 'query_labels' and 'reference_labels':
query_labels: tensor([[-0.0357, -0.0059, -0.0690, ..., -0.0238, -0.0273, 0.0572],
[-0.0684, 0.0068, -0.0903, ..., -0.0357, 0.0095, 0.0314],
[-0.0661, -0.0320, -0.0592, ..., 0.0084, -0.0471, 0.0413],
...,
[-0.0183, -0.0594, -0.0730, ..., 0.0730, 0.0116, 0.0096],
[ 0.0094, -0.0630, -0.0917, ..., 0.1146, -0.0042, 0.0074],
[ 0.0019, -0.0335, -0.0428, ..., 0.0115, -0.0445, 0.0971]],
device='cuda:0') torch.Size([136093, 512])
reference_labels: tensor([716, 716, 716, ..., 810, 810, 810], device='cuda:0') torch.Size([136093])

Could you please help us find out the possible reasons?

Train error with SOP

Hi, I ran the training in SOP, model is resnet, when it computing accuracy for the test split w.r.t ['test'], I got this error:

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:162; details: cublas failed (13): (512, 512) x (60502, 512)' = (512, 60502)
Aborted

typo in equation 8

😉nice paper, but I think that the summation in equation 8 should be $\sum_{j=1}^{|P_i^b|}$ instead of $\sum_{j=1}^{B}$.

Expressions in code and paper do not match

Hello. Thank you for your paper

I read and check the your paper and code. i found the not match Equation (4) and step_rank code in smooth_rank_ap.py line i51

you are write equaton about
image but i don`t found (t-delta) in code

so why don`t made (t-delta) and sigmoid ?? cod write just ''' tens[~target & pos_mask & margin_mask] = rho * tens[~target & pos_mask & margin_mask] + offset'''

i don`t understand. Thanks for the explanation

RuntimeError: derivative for heaviside is not implemented

Hello elias-ramzi, thanks for your brilliant work.
I run the sop code below, but found error: 'RuntimeError: derivative for heaviside is not implemented'
CUDA_VISIBLE_DEVICES=0 python roadmap/single_experiment_runner.py \ 'experience.experiment_name=sop_ROADMAP_${dataset.sampler.kwargs.batch_size}_sota' \ experience.seed=333 \ experience.max_iter=100 \ 'experience.log_dir=${env:HOME}/experiments/ROADMAP' \ optimizer=sop \ model=resnet \ transform=sop_big \ dataset=sop \ dataset.sampler.kwargs.batch_size=128 \ dataset.sampler.kwargs.batches_per_super_pair=10 \ loss=roadmap

My torch version is 1.9.1. I found a nondifferentiable operation torch.heaviside in SupAP class (step_rank function). Is this the reason?

Evaluation issues regarding memory overflow on GPU

Hi,

After having trained the model with SOP dataset, I tried to use the ckpt to run evaluate.py. The cmd i used is : python evaluate.py --config /home/zhijue/experiments/ROADMAP/sop_ROADMAP_128_sota/weights/epoch_100.ckpt --bs 128.
The result is like this:

09/08/2022 02:16:27 PM - INFO - running k-nn with k=2048
09/08/2022 02:16:27 PM - INFO - embedding dimensionality is 512
Traceback (most recent call last):
File "evaluate.py", line 95, in
metrics = load_and_evaluate(
File "evaluate.py", line 51, in load_and_evaluate
metrics = eng.evaluate(
File "/home/zhijue/project/models/ROADMAP/roadmap/utils/get_set_random_state.py", line 32, in wrapper
output = func(*args, *kwargs)
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/evaluate.py", line 141, in evaluate
return tester.test(
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/pytorch_metric_learning/testers/base_tester.py", line 306, in test
self.do_knn_and_accuracies(
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/pytorch_metric_learning/testers/global_embedding_space.py", line 21, in do_knn_and_accuracies
a = self.accuracy_calculator.get_accuracy(
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/accuracy_calculator.py", line 155, in get_accuracy
knn_indices, knn_distances = get_knn(
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/get_knn.py", line 16, in get_knn
distances, indices = get_knn_faiss(references, queries, num_k)
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/get_knn.py", line 47, in get_knn_faiss
index.add(references)
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/faiss/contrib/torch_utils.py", line 96, in torch_replacement_add
return self.add_numpy(x)
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/faiss/init.py", line 104, in replacement_add
self.add_c(n, swig_ptr(x))
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/faiss/swigfaiss.py", line 4313, in add
return _swigfaiss.IndexReplicas_add(self, n, x)
RuntimeError: Error in virtual void
faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/StandardGpuResources.cpp:410: Error: 'err == cudaSuccess' failed: Failed to cudaMalloc 123908096 bytes on device 0 (error 2 out of memory
Outstanding allocations:
Alloc type TemporaryMemoryBuffer: 1 allocations, 1610612736 bytes

It seems like the model only let me to do evaluation on single GPU. Is it possible for it to perform on multiple GPU?

My GPU config is:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| 0 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 22% 35C P8 5W / 250W | 344MiB / 11264MiB | 0% Default |

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.