GithubHelp home page GithubHelp logo

elias-ramzi / roadmap Goto Github PK

View Code? Open in Web Editor NEW
71.0 4.0 9.0 594 KB

This repository contains the official implementation of the NeurIPS'21 paper, ROADMAP: Robust and Decomposable Average Precision for Image Retrieval.

Home Page: https://arxiv.org/abs/2110.01445

License: MIT License

Python 100.00%
image-retrieval average-precision neurips-2021 metric-learning

roadmap's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

roadmap's Issues

Train error with SOP

Hi, I ran the training in SOP, model is resnet, when it computing accuracy for the test split w.r.t ['test'], I got this error:

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:162; details: cublas failed (13): (512, 512) x (60502, 512)' = (512, 60502)
Aborted

RuntimeError: derivative for heaviside is not implemented

Hello elias-ramzi, thanks for your brilliant work.
I run the sop code below, but found error: 'RuntimeError: derivative for heaviside is not implemented'
CUDA_VISIBLE_DEVICES=0 python roadmap/single_experiment_runner.py \ 'experience.experiment_name=sop_ROADMAP_${dataset.sampler.kwargs.batch_size}_sota' \ experience.seed=333 \ experience.max_iter=100 \ 'experience.log_dir=${env:HOME}/experiments/ROADMAP' \ optimizer=sop \ model=resnet \ transform=sop_big \ dataset=sop \ dataset.sampler.kwargs.batch_size=128 \ dataset.sampler.kwargs.batches_per_super_pair=10 \ loss=roadmap

My torch version is 1.9.1. I found a nondifferentiable operation torch.heaviside in SupAP class (step_rank function). Is this the reason?

Evaluate error

thanks for your nice work!
I tried to reproduce the results on iNaturalist and SOP with a 32G V100. But I met the same error in the evaluation phase.

The error information is as follows:
[2023-03-31 09:27:57,441][PML][INFO] - Computing accuracy for the test split w.r.t ['test']
Traceback (most recent call last):
File "roadmap/single_experiment_runner.py", line 66, in single_experiment_runner
checkpoint_dir=resume,
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/run.py", line 154, in run
restore_epoch=restore_epoch,
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/engine/train.py", line 100, in train
**dataset_dict,
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/utils/get_set_random_state.py", line 32, in wrapper
output = func(*args, **kwargs)
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/engine/evaluate.py", line 145, in evaluate
splits_to_eval=splits_to_eval,
File "/home/yangjian/anaconda3/envs/roadmap/lib/python3.7/site-packages/pytorch_metric_learning/testers/base_tester.py", line 307, in test
reference_split_names,
File "/home/yangjian/anaconda3/envs/roadmap/lib/python3.7/site-packages/pytorch_metric_learning/testers/global_embedding_space.py", line 26, in do_knn_and_accuracies
self.ref_includes_query(query_split_name, reference_split_names),
File "/home/yangjian/retrieval/ROADMAP-main/roadmap/engine/accuracy_calculator.py", line 140, in get_accuracy
query_labels, reference_labels, self.label_comparison_fn,
File "/home/yangjian/anaconda3/envs/roadmap/lib/python3.7/site-packages/pytorch_metric_learning/utils/accuracy_calculator.py", line 153, in get_label_match_counts
comparison = unique_query_labels[:, None] == reference_labels
RuntimeError: The size of tensor a (512) must match the size of tensor b (60502) at non-singleton dimension 2

I have printed the 'query_labels' and 'reference_labels':
query_labels: tensor([[-0.0357, -0.0059, -0.0690, ..., -0.0238, -0.0273, 0.0572],
[-0.0684, 0.0068, -0.0903, ..., -0.0357, 0.0095, 0.0314],
[-0.0661, -0.0320, -0.0592, ..., 0.0084, -0.0471, 0.0413],
...,
[-0.0183, -0.0594, -0.0730, ..., 0.0730, 0.0116, 0.0096],
[ 0.0094, -0.0630, -0.0917, ..., 0.1146, -0.0042, 0.0074],
[ 0.0019, -0.0335, -0.0428, ..., 0.0115, -0.0445, 0.0971]],
device='cuda:0') torch.Size([136093, 512])
reference_labels: tensor([716, 716, 716, ..., 810, 810, 810], device='cuda:0') torch.Size([136093])

Could you please help us find out the possible reasons?

DG_{AP} in Eq. 8

Is there any guarantee that DG_{AP} is non-negative?

For example, we have two batches:

  • Batch 1: red < red < green < green < green
  • Batch 2: red < green < green < green < red

The AP of Batch 1 is 1.0, and the AP of Batch 2 is 0.64. The average AP over such two batches is 0.82.

If these two batches form a dataset like this: red < red < red < green < green < green < red < green < green < green,
then the AP over the dataset is 0.915, and DG_{AP} here is -0.095, right?

Could you help me understand this part? Please let me know if I misunderstood something. Thanks. @elias-ramzi

typo in equation 8

😉nice paper, but I think that the summation in equation 8 should be $\sum_{j=1}^{|P_i^b|}$ instead of $\sum_{j=1}^{B}$.

Train errors on CUB

After configuring the runtime environment according to the guide, I ran the training code with the following error: Traceback (most recent call last):
File "roadmap/single_experiment_runner.py", line 7, in
import roadmap.run as run
File "/code/yx/SIGIR/ROADMAP/roadmap/run.py", line 10, in
from ray import tune
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/init.py", line 2, in
from ray.tune.tune import run_experiments, run
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/tune.py", line 14, in
from ray.tune.utils.callback import create_default_callbacks
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/utils/callback.py", line 7, in
from ray.tune.progress_reporter import TrialProgressCallback
File "/home/yx/anaconda3/envs/osap/lib/python3.7/site-packages/ray/tune/progress_reporter.py", line 28, in
raise ImportError("ray.tune in ray > 0.7.5 requires 'tabulate'. "
ImportError: ray.tune in ray > 0.7.5 requires 'tabulate'. Please re-run 'pip install ray[tune]' or 'pip install ray[rllib]'.
`

Expressions in code and paper do not match

Hello. Thank you for your paper

I read and check the your paper and code. i found the not match Equation (4) and step_rank code in smooth_rank_ap.py line i51

you are write equaton about
image but i don`t found (t-delta) in code

so why don`t made (t-delta) and sigmoid ?? cod write just ''' tens[~target & pos_mask & margin_mask] = rho * tens[~target & pos_mask & margin_mask] + offset'''

i don`t understand. Thanks for the explanation

Evaluation issues regarding memory overflow on GPU

Hi,

After having trained the model with SOP dataset, I tried to use the ckpt to run evaluate.py. The cmd i used is : python evaluate.py --config /home/zhijue/experiments/ROADMAP/sop_ROADMAP_128_sota/weights/epoch_100.ckpt --bs 128.
The result is like this:

09/08/2022 02:16:27 PM - INFO - running k-nn with k=2048
09/08/2022 02:16:27 PM - INFO - embedding dimensionality is 512
Traceback (most recent call last):
File "evaluate.py", line 95, in
metrics = load_and_evaluate(
File "evaluate.py", line 51, in load_and_evaluate
metrics = eng.evaluate(
File "/home/zhijue/project/models/ROADMAP/roadmap/utils/get_set_random_state.py", line 32, in wrapper
output = func(*args, *kwargs)
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/evaluate.py", line 141, in evaluate
return tester.test(
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/pytorch_metric_learning/testers/base_tester.py", line 306, in test
self.do_knn_and_accuracies(
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/pytorch_metric_learning/testers/global_embedding_space.py", line 21, in do_knn_and_accuracies
a = self.accuracy_calculator.get_accuracy(
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/accuracy_calculator.py", line 155, in get_accuracy
knn_indices, knn_distances = get_knn(
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/get_knn.py", line 16, in get_knn
distances, indices = get_knn_faiss(references, queries, num_k)
File "/home/zhijue/project/models/ROADMAP/roadmap/engine/get_knn.py", line 47, in get_knn_faiss
index.add(references)
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/faiss/contrib/torch_utils.py", line 96, in torch_replacement_add
return self.add_numpy(x)
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/faiss/init.py", line 104, in replacement_add
self.add_c(n, swig_ptr(x))
File "/home/zhijue/anaconda3/envs/roadmap/lib/python3.8/site-packages/faiss/swigfaiss.py", line 4313, in add
return _swigfaiss.IndexReplicas_add(self, n, x)
RuntimeError: Error in virtual void
faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/StandardGpuResources.cpp:410: Error: 'err == cudaSuccess' failed: Failed to cudaMalloc 123908096 bytes on device 0 (error 2 out of memory
Outstanding allocations:
Alloc type TemporaryMemoryBuffer: 1 allocations, 1610612736 bytes

It seems like the model only let me to do evaluation on single GPU. Is it possible for it to perform on multiple GPU?

My GPU config is:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| 0 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 22% 35C P8 5W / 250W | 344MiB / 11264MiB | 0% Default |

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.