GithubHelp home page GithubHelp logo

yxgeee / openibl Goto Github PK

View Code? Open in Web Editor NEW
269.0 9.0 41.0 4.88 MB

[ECCV-2020 (spotlight)] Self-supervising Fine-grained Region Similarities for Large-scale Image Localization. 🌏 PyTorch open-source toolbox for image-based localization (place recognition).

Home Page: https://yxgeee.github.io/projects/sfrs

License: MIT License

Python 93.06% Shell 6.94%
netvlad image-based-localisation localization place-recognition image-retrieval

openibl's Introduction

Hi there 👋

  • 🌱 I’m currently a Principal Researcher at Tencent ARC Lab.
  • 🔭 I’m currently working on vision and multimodal foundation models.
  • 👯 I’m looking for self-motivated interns to collaborate on related research topics.
  • 📫 Reach me at my homepage.

openibl's People

Contributors

yxgeee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openibl's Issues

关于参与计算loss_soft的特征

loss_soft = (- F.softmax(sim_diff_label[:,:,0].contiguous().view(B,-1)/self.temp[gen], dim=1).detach() * log_sim_diff).mean(0).sum()

这里为什么只是取区域特征相似度得分图的sim_diff_label[:,:,0]第一行来计算loss?参与计算的只有[B,diff_pos_num,9]。

bc_features.data.copy_(torch.cat(features))

另外图片太多内存会爆,所以只取了1万类,query每类1张,gallery每类4张,是否需要把pos_num=4,neg_num=10?

Unable to train

I tried a lot to train but for some reason I just cant train it with custom dataset. It throws the following error, :(

RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

Any ways to fix it?

About reproduction

Hi, thank you for sharing this project. Good job! I tried to run this project, but there are some questions that confuse me.

1 When running train_sfrs_dist.sh, Loss_hard and Loss_soft are like follows: Loss_hard << soft-weight(0.5)*Loss_soft, so, does Loss_hard make a small contribution or even negligible? and Loss_soft does not seem to converge, have you ever seen a similar phenomenon when you train the network?

Epoch: [4-7][160/320]	Time 0.672 (0.674)	Data 0.069 (0.077)	Loss_hard 0.018 (0.052)	Loss_soft 1.749 (2.275)
Epoch: [4-7][170/320]	Time 0.670 (0.672)	Data 0.065 (0.076)	Loss_hard 0.041 (0.050)	Loss_soft 2.780 (2.272)
Epoch: [4-7][180/320]	Time 0.671 (0.671)	Data 0.063 (0.075)	Loss_hard 0.015 (0.049)	Loss_soft 1.535 (2.251)
Epoch: [4-7][190/320]	Time 0.665 (0.670)	Data 0.063 (0.074)	Loss_hard 0.005 (0.049)	Loss_soft 1.572 (2.239)
Epoch: [4-7][200/320]	Time 0.666 (0.669)	Data 0.060 (0.073)	Loss_hard 0.019 (0.048)	Loss_soft 2.144 (2.230)
Epoch: [4-7][210/320]	Time 0.667 (0.668)	Data 0.063 (0.073)	Loss_hard 0.022 (0.049)	Loss_soft 2.122 (2.247)
Epoch: [4-7][220/320]	Time 0.658 (0.668)	Data 0.055 (0.072)	Loss_hard 0.005 (0.048)	Loss_soft 1.374 (2.239)
Epoch: [4-7][230/320]	Time 0.504 (0.667)	Data 0.047 (0.071)	Loss_hard 0.028 (0.047)	Loss_soft 1.855 (2.239)
Epoch: [4-7][240/320]	Time 0.665 (0.667)	Data 0.061 (0.071)	Loss_hard 0.201 (0.048)	Loss_soft 3.224 (2.247)
Epoch: [4-7][250/320]	Time 0.668 (0.666)	Data 0.063 (0.070)	Loss_hard 0.001 (0.047)	Loss_soft 1.920 (2.239)
Epoch: [4-7][260/320]	Time 0.660 (0.666)	Data 0.068 (0.070)	Loss_hard 0.037 (0.047)	Loss_soft 2.350 (2.240)
Epoch: [4-7][270/320]	Time 0.658 (0.666)	Data 0.062 (0.069)	Loss_hard 0.068 (0.047)	Loss_soft 3.046 (2.240)
Epoch: [4-7][280/320]	Time 0.717 (0.668)	Data 0.060 (0.069)	Loss_hard 0.019 (0.048)	Loss_soft 2.411 (2.233)
Epoch: [4-7][290/320]	Time 0.693 (0.669)	Data 0.060 (0.068)	Loss_hard 0.096 (0.048)	Loss_soft 3.048 (2.247)
Epoch: [4-7][300/320]	Time 0.669 (0.670)	Data 0.059 (0.068)	Loss_hard 0.091 (0.049)	Loss_soft 3.546 (2.255)
Epoch: [4-7][310/320]	Time 0.669 (0.670)	Data 0.064 (0.068)	Loss_hard 0.014 (0.049)	Loss_soft 2.299 (2.247)
Epoch: [4-7][320/320]	Time 0.629 (0.669)	Data 0.026 (0.067)	Loss_hard 0.057 (0.048)	Loss_soft 3.039 (2.261)

2 The results on pitts250K of the best model in my reproduction are slightly lower than the results of your paper. 89.8% | 95.9% | 97.3% VS 90.7% | 96.4% | 97.6%. The best model in my reproduction is output of 5th epoch of the third generation, instead of convergence at the fourth generation as mentioned in the paper. Is the best model the output of the last iteration when you train?

3 I only use one GPU (2080ti), the other parameters are default. I don't Know if the inferior results are duo to too few GPUs, or is there something else I need to pay attention to?

Evalution protocol

Hi,

I have a naive question on the evaluation of the task.
For the task, in your paper, you said that d <= 25 meters is considered as a positive match. Is it the direct L2 distance between the coordinate UTM. (UTMEasting, UTMNorthing)? Is it possible to point this part in your implementation?

Precisely, I see in your evaluator.py, you compute features, pairwise similarities etc.. My question is more about the ground-truth used here, I didn't manage to find how you compute it.

In your scripts, you have a spatial nms, is it a post-processing standard and you used to get numbers in the paper?https://github.com/yxgeee/OpenIBL/blob/master/examples/test.py#L130

Also I notice that there are some queries with NaN in UTM coordinates, for example : Tokyo247, query name 00931.csv.
For those queries, they are dropped?

Thanks so much for your help,

Best,

Xi

about the dataset

Thansk for your work, i have some problems about the dataset. I have emailed to the netvlad author but can not get reply. So, could you share the dataset link of Tokyo 24/7, Pitts250k and Pitts30k. I just use it in my research.

.mat files

Hello Yixiao,

I want to ask how can I generate mat files for Pittsburgh dataset?

examples/data
├── pitts
│ ├── raw
│ │ ├── pitts250k_test.mat
│ │ ├── pitts250k_train.mat
│ │ ├── pitts250k_val.mat
│ │ ├── pitts30k_test.mat
│ │ ├── pitts30k_train.mat
│ │ ├── pitts30k_val.mat

Thanks.

Reproducing SARE results

Thank you for releasing the code. When reproducing SARE results, I am able to reproduce the results in your paper when I use the dot product based code, but not the original Euclidean distance. Are the results in your work based on the code for the dot product?

For sare_joint the difference between the Euclidean distance and the dot product is about 15% for R@1, for both Pitts250k and Tokyo 24/7, which seems very high to me. Did you experience this as well?

About negative samples

Hi, I have some doubts aoubt the line 84 of ibl/utils/data/dataset.py
self.train_neg = [self.train_neg[idx] for idx in select]
I don't know the effect of this line, maybe it can't help to produce the negative samples outside 25m.
Maybe it should be deleted?

Extract descriptor of single image using models trained on custom datasets

Hello, thanks a lot for this valuable work. I tried your extract.py for image trieval during visual localization, it's great. But I want to train your SFRS on my own datasets and use the best model to extract descriptors of images. I tried to load pretrained models in model zoo, but it seems some parameters of base model and netvlad are missing. Only vgg16_netvlad.pth can be used for extraction, can you help me with this. How can I produce a new vgg16_netvlad.pth after training. I would be very grateful if you could answer my question.

About training

Thanks for sharing the codes.
I tried to train the model in default settings and I found that the training is unstable.
The model often collapses in the final generation, and the best model usually comes from the 2nd or 3rd generation.
An example is shown as follows.

Epoch: 0-4, recalls(1/5/10): 88.7%, 96.1%, 97.8%

Epoch: 1-0, recalls(1/5/10): 88.4%, 96.2%, 97.8%
Epoch: 1-1, recalls(1/5/10): 89.5%, 96.8%, 98.1%
Epoch: 1-2, recalls(1/5/10): 89.9%, 96.8%, 98.2%
Epoch: 1-3, recalls(1/5/10): 89.8%, 96.8%, 98.2%
Epoch: 1-4, recalls(1/5/10): 90.1%, 96.8%, 98.2%

Epoch: 2-0, recalls(1/5/10): 88.7%, 96.2%, 97.6%
Epoch: 2-1, recalls(1/5/10): 89.5%, 96.8%, 97.8%
Epoch: 2-2, recalls(1/5/10): 89.3%, 96.8%, 98.0%
Epoch: 2-3, recalls(1/5/10): 89.8%, 97.0%, 98.1% (* the best model)
Epoch: 2-4, recalls(1/5/10): 89.7%, 96.8%, 98.0%

Epoch: 3-0, recalls(1/5/10): 2.8%, 10.3%, 18.1%
Epoch: 3-1, recalls(1/5/10): 2.8%, 10.5%, 18.3%
Epoch: 3-2, recalls(1/5/10): 2.8%, 10.5%, 18.1%
Epoch: 3-3, recalls(1/5/10): 2.9%, 10.9%, 18.3%
Epoch: 3-4, recalls(1/5/10): 3.0%, 10.4%, 18.4%

Did I train the model in the wrong way?
Could you please help me to figure out this problem?

Test on RParis and ROxford dataset

Thank you for open-sourcing the code and the detailed documentation. I want to know whether you evaluate the model on Rparis and ROxford datasets. I have tried to evaluate it using the image processing configuration in your document [https://github.com/yxgeee/OpenIBL#:~:text=Start%20without%20Installation-,Extract%20descriptor%20for%20a%20single%20image,-import%20torch%0Afrom] but got poor performance on mAP score (ROxf:Medium 42.57, ROxford:Hard 19.25, RPar:Medium 44.8 RPar:Hard 20.27). Do you have any idea about it? I think it should not be like this. Something may be wrong in image processing but not related to the model itself. (I use your model_best.pth.tar and the corresponding pca parameters).
I am looking forward to your reply.

Other Models

I planned to integrate more models (feature extractors) apart from vgg16 into your library and then saw in some of your code comments some artifacts about resnet. Are you planning to integrate those other models in the future and if you have already done so privately are there any insights about the effects of others models that you could share?

how the training set is set up

hello, can I know how the training set is set up? The Tokyo24/7 I downloaded is in the form of a picture plus a csv file, but in your code it's a mat file. Is this mat file generated by yourself, and what format is it in?

Reproduce results on Tokyo247 dataset

Hi Dr. Ge,
Thanks for sharing this nice work!
I have successfully reproduced the reported results on Pittsbugh250k. However, on Tokyo247, I got slightly worse results than reported. Here is what we reproduced:
Recall@1 84.1
Recall@5 91.1
Recall@10 92.4
Since we got exactly the same results on Pittsburgh, there should be no problem with the checkpoint loading. I guess the tokyo247 dataset is wrongly installed in my case. I notice that there are three versions of tokyo247 queries, so I want to check with you whether your results are based on queries_v3 or v2.
Besides, I notice that tokyo247 query images are preprocessed with different resizing transformation, may I know the reason of it?

models for torch.hub

Hi, sorry to disturb you. When I study about Quick Start without Installation, I find that you only uploaded the SFRS model for torch.hub, can you upload SARE's and NetVLAD's? Thanks a lot.

Question about NetVLAD in MODEL_ZOO.md

Hi, @yxgeee
Thanks a lot for sharing this valuable image-based localization tool.
I would like to re-rank the retrieval results based on your work.
In MODEL_ZOO.md, I only find SFRS and SARE models but miss NetVLAD model.
Where could I download the pretrained NetVLAD model?
Thanks a lot for your help.

Question about CRN implementation.

Hi, @yxgeee
I read your ECCV-2020 spotlight paper, and I notice that
you compare with three methods: NetVLAD, SARE and CRN.
I would like to know if OpenIBL will implement CRN in the future.
Thanks a lot for your help.

reproduction problem

nice work, could you please upload the pitts and tokyo dataset to a google drive? I find nowhere to get them

reproduction problem

I just run the code without any change and find the initial recall scores is as follow:
Recall Scores:
top-1 1.2%
top-5 4.6%
top-10 8.7%
then I continue the training process util the generation1 epoch2 with the recall score:
Recall Scores:
top-1 1.4%
top-5 6.3%
top-10 11.9%

  • Finished generation 1 epoch 2 recall@1: 1.4% recall@5: 6.3% recall@10: 11.9% best@5: 7.7%
    it seems no obvious promotion, is that right?

about modify d=25 meters

Thank you for your work. If I want to change the threshold of distance, what should I modify in the code

pitts 250k top-1 88.2%

Thansk for your work. I use your model to test data pitts 250k 。

top-1 88.2%
top-5 95.4%
top-10 96.7%
Inconsistent with the paper。

missing keys in state_dict

Thansk for your work. I use your model.
missing keys in state_dict: {'base_model.base.28.bias', 'net_vlad.conv.weight', 'base_model.base.19.bias', 'base_model.base.28.weight', 'base_model.base.24.bias', 'base_model.base.26.weight', 'base_model.base.5.weight', 'base_model.base.7.weight', 'base_model.base.17.bias', 'base_model.base.12.weight', 'net_vlad.centroids', 'base_model.base.7.bias', 'base_model.base.0.bias', 'base_model.base.14.weight', 'base_model.base.5.bias', 'base_model.base.21.bias', 'base_model.base.19.weight',

mean and std of image transform

Hello,
I was wondering how did the mean and std of the transform come from ? I've calculated that of Pittsburg and it seem way different, especially the std !

def get_transformer_test(height, width, tokyo=False):

mean=[0.48501960784313836, 0.4579568627450961, 0.4076039215686255],

std=[0.00392156862745098, 0.00392156862745098, 0.00392156862745098]

The std is way too small, can you please explain how did you get to it ?
Thanks

Reproduction failed

I used the script ./scripts/train_sfrs_dist.sh to train SFRS on 4 GPUs(satisfy~11G per GPU) but failed to reproduce the results. Here are our reproduction result details :

    Pitts250k     Tokyo24/7  
  recall@1 recall@5 recall@10 recall@1 recall@5 recall@10
paper 90.7 96.4 97.6 85.4 91.1 93.3
run1 90.0 95.8 97.0 81.0 89.5 91.1
run2 89.9 95.8 97.0 79.0 88.9 89.8

Can you give us any advice on this situation?

Question on tab.3 of the paper

Hi, thanks for sharing your nice work!!!

Looking into the paper, I realise that it also includes comparaisons on Oxford and Paris.
As I know, there are other approaches report better performances on these benchmarks, such as : Fine-tuning CNN Image Retrieval with No Human Annotation.
They report 87.8 in terms of mAP on Oxford.

I am wondering whether there are particular reasons that these two lines of research are not comparable.

Thanks in advance,

Best

Performances on Oxford and Paris

Hello,
I tried your pretrained model on cnnimageretrieval-pytorch test script, and I got
mAP : 67.90 for oxford (73.9 on your paper)
mAP : 76.64 for paris (82.5 on your paper)

Am i missing something ? (both with and without PCA gave similar performances)

About Normalize in get_transformer_train and get_transformer_test

Thanks for your work!
I saw you are using
T.Normalize(mean=[0.48501960784313836, 0.4579568627450961, 0.4076039215686255],
std=[0.00392156862745098, 0.00392156862745098, 0.00392156862745098]) in get_transformer_train and get_transformer_test, different from the usually used:
T.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]), which is also used in https://github.com/Nanne/pytorch-NetVlad.
(1) Is it due to that you are using the pretrained model vgg16_pitts_64_desc_cen.hdf5 which is matched to the former standard deviation while the pretrained VGG model in torchvision is using the later standard deviation?
And respectively, the learing rate you are using is 0.001, while Nanne/pytorch-NetVlad is using 0.0001, is it due to this?
(2) I'm trying to reproduce the SARE-joint result under the framework of Nanne/pytorch-NetVlad. I added the loss function written on my own and I'm using pretrained model vgg16_pitts_64_desc_cen.hdf5. The learning rate is still 0.0001 as Nanne/pytorch-NetVlad did. The standard deviation fallows Nanne/pytorch-NetVlad.
But I cannot achieve 89% Recall1 result on Pitts250k-test. I only have 3 GPUs to use so I have to set batch size to (3) I didn't add T.ColorJitter(0.7, 0.7, 0.7, 0.5) in get_transformer_train as Nanne/pytorch-NetVlad did.
Is it due to the batch size or T.ColorJitter or I should just use the pretrained VGG model in torchvision with T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])? Or would you give me some other suggestions?
Really thanks for your help!

关于数据集组织的问题

你好!
在复现你的代码的时候,执行sfrs训练(./scripts/train_sfrs_dist.sh)遇到了这样的问题:
QQ截图20201217161354
问题出在(

assert(len(neg_indices)==self.neg_num)
)

在采样器中输出发现训练集的pos_list\neg_list是一样的:

self.pos_list = pos_list

self.neg_list = neg_list

QQ截图20201217161459
修改过dataset.py中近邻搜索方法搜索半径,问题依然存在
dist, neighbors = neigh.radius_neighbors(utm_query, radius=intra_thres)

我的数据集组织情况如下,
QQ截图20201217161522
想请教一下问题所在,谢谢!

How to visualize the feature map

Hi, thanks for your excellent work!!
I have a question about how do you visualize the feature map(before VLAD aggregation)? Is it done by simply adding the channels? Then get a 30x40 heatmap

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.