GithubHelp home page GithubHelp logo

ubc-vision / cotr Goto Github PK

View Code? Open in Web Editor NEW
443.0 443.0 56.0 17.09 MB

Code release for "COTR: Correspondence Transformer for Matching Across Images"(ICCV 2021)

License: Apache License 2.0

Python 100.00%

cotr's People

Contributors

ducha-aiki avatar jahad9819jjj avatar jiangwei221 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cotr's Issues

Question

What does the dense correspondence map in Figure 1 mean and how to get it and how to reflect it numerically, I only know that it is the dense correspondence between the two images, what does color-coded ‘x’ channel mean ?

inference_helper.py

Hello, what function does the cotr_patch_flow_exhaustive function in the inference_helper.py file implement? What are the meanings of p_i and p_j?

Matching time

您好,感谢您精彩的工作。有点疑问向您请教,请问该如何理解一个点的查询,每秒可以做到35个对应点?
"Our currently non-optimized prototype implementation queries one point at a time, and achieves 35 correspondences per second on a NVIDIA RTX 3090 GPU. "
我最近在跑您的代码,我在NVIDIA RTX 3090 GPU跑demo_single_pair.py,匹配大概花了30s,请问这正常吗?
谢谢!

Question

Hello, when running through the code with the pre-trained model, it appears that RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 7.79 GiB total capacity; 2.90 GiB already allocated; 1.83 GiB free; 4.80 GiB reserved in total by PyTorch).Is there any solution?For example, which parameters to adjust?

MLP version

Hi, Could you release the MLP-version Code for point corespondence?
Thanks

retrain

Hello, I would like to ask how to retrain on the basis of the pre-trained model to enhance the network's resistance to rotation?How to implement?

Memory footprints

Could you please tell the memory footprints of COTR during traing process? Thanks

Conda env

I found the following command can install the basic env for inference:

conda create -n cotr_env python=3.7
conda activate cotr_env
conda install scipy=1.2.1
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch
conda install opencv=3.4.2
conda install scikit-image jupyter pytables matplotlib imageio tqdm
pip install tensorboardX
pip install easydict
pip install kornia==0.5.3
pip install --upgrade vispy

the way queries generate

Hello, thank you very much for such a good job.
I am a beginner and after reading your thesis I have a general understanding of the principles of this project. But there are some details that I am a little confused about. What about the query pairs on the training data randomly generated or labelling before training?How the number of queries is decided.
If I want to finetune against my training set, how should I do it?

question

1.Can I understand it this way? The value of C is the pixel value of the matching point in the right image? Or is it a relational matrix?
2.what means correspondence map C has 2 channels?
3.I still have questions to ask you, which layer of the network is C obtained, transformer or MLP, and what are the specific dimensions? Is it a 256-dimensional vector? What do the two channels mentioned in the reply refer to? Does the query pionts input into the network have 256 dimensions and only have position coordinates of 0 and 1? Or is it a 2-dimensional integer coordinate value?

Bad feature points

I match the two binary imgs as follow, the feature points matched by COTR is red in img1 and green in img2, then i get the homography matrix from img1 to img2.
feature_points

The contour and feature points of img1 is projected as the red mask and points , the green points is the source feature points in img2, they don't seem to match well from results, COTR doesn't extract the feature points on the boundary of contours.
projection

training

Hello, I would like to ask if you are using the complete MegaDepth dataset for training data, or select a part of it, and if it is convenient, can you provide a training data?

glfw on headless machine

Hi, thanks for the great work! I am trying to run the code on a headless server but I get an error with glfw:
ends/_glfw.py", line 215, in _vispy_get_native_app
raise OSError('Could not init glfw:\n%r' % _glfw_errors)
OSError: Could not init glfw:
["Error 65544: b'X11: Failed to open display '"].

I did some research, and it seems that glfw does not support headless rendering. Have you ever tried it?

Training cost

Dear author, could you please tell me how many gpus are needed for training and the time cost?

question about positional encoding

Hi. According to formula (4) in your paper, you add positional encoding P to get a context feature map c. But in your code, you just follow transformer to add positional encoding to key and query and keep value clean? Did I miss anything?

Pre-processing of Megadepth

Hi Wei,

Thanks for sharing the great code. I was a bit curious about the the pre-processing procedures of COTR. It seems to do similar things to the pro processing of D2Net. Would you mind clarify if I can use the data processed by D2Net for COTR, or the potential difference between them?

Best,
Jianyuan

How to get depth map

Hi, thank you for your nice work:)
Sorry I'm new to this filed, I wonder how can I get the depth map of two images using COTR results?
thanks a lot

Possible redundancy in the code

Hi, I notice that when constructing the Transformer, you always return the intermediate features at this line. However, after feeding them to MLP for corr regression, you only take the prediction over the last layer at this line. So I guess maybe you can set return_intermediate=False to save some memory/computation?

Match time

Hello, about COTR, if I use other feature extraction methods to get the feature point positions of the image and input them, can I reduce the time of COTR feature matching?

Common view image

Hello, I have the honor to read your article. I would like to ask how to get the common view image after the mask as shown in Figure 4 of the paper according to the mask matrix.

question about the code detail

thanks for your great code and paper~ i am new to this area and have 2 questions about the code
1.in cotr-patch-flow-exhaustive:
what is the meaning of cycle_grid? why is the norm result of cycle_grid and in_grid be the confidence?
2.what does the function merge-flow-patches do with the correspondence?
looking forward to your reply~

[Request for help] How to setup the prepare_nn_distance_mat.py ... --scene, --seq ?

Hi! Firstly, thank you very much for releasing the code, and thank you very much for producing this research works. I enjoy reading your paper. It is awesome!
Here, I would like to try to train COTR.
So far I have tried to follow the steps in https://github.com/ubc-vision/COTR/blob/master/prepare_data.md...
From those steps, I have successfully generated rectify.sh, megadepth_test.json, megadepth_train.json, megadepth_val.json, and megadepth_valid_list.json...

However, at the last step, that is, to generate the distance matrix...

  • I am stuck on how to provide the input for --scene, --seq which are the required input parameters...

  • The reason for my confusion is that I am not sure how many and which scenes and sequences are needed for training COTR ?

In other words, how should I run python3 prepare_nn_distance_mat.py such that it will prepare the data for training COTR?

Rotation angle

Hello, I would like to ask, when COTR extracts the common view area, for some scenes with too large rotation angle, the formula area cannot be extracted. What is the possible reason for this phenomenon?

About HPatches datasets

Thanks very much for your great work! AND i want to know that how do you test and evaluate the HPatches dataset(in the code)? Can you tell me how to get the relevant code?

Sharing raw data of ETH3D and KITTI

Hi everyone:

I'd like to share the raw output from COTR for ETH3D and KITTI dataset.

ETH3D eval: https://drive.google.com/file/d/1pfAuHRK7FvB6Hc9Rru-beH6F-2lpZAk6/view?usp=sharing

KITTI: https://drive.google.com/file/d/1SiN5UbqautqosUCInQN2WhyxbRcbWt8b/view?usp=sharing

The format is: {src_id}->{tgt_id}.npy, and I saved the results as a dictionary. There are several keys: "raw_corr", "drifting_forward", and "drifting_backward".
"raw_corr" is the raw sparse correspondences in XYXY format, and "drifting_forward", "drifting_backward" are used to the masks to filter out drifted predictions.

is demo_single_pair supposed to be slow ?

Hi, thanks for your great work. I tried to run the demo_single_pair.py and it took around 350s to get the correspondences and I'm just wodering whether that's normal even running on gpu ?

thanks
Cheng

depth information

Hello, I would like to ask, about COTR is it possible to calculate the depth map of an image?

Quantitative Results

Hi! Great paper! I was wondering how could I get the quantitative results such as the AEPE and Fl. from your code?

Thanks!

Dense optical flow as in paper Figure 1 (c)

Hi, thanks for the great work! I wonder how can I estimate the optical flow between two images. Say img1 is of shape [H, W], then can I basically reshape the grid coordinates to [H*W, 2] and then input to queries_a as in this demo?

How is the warpped image in Figure 9 generated?

Hi, thanks for the great work! I'm curious about how do you generate the warpped image in Figure 9 by dense flow. If I understand correctly, you input a pixel coordinate (x, y) in img1, and get its corresponding coordinate (x', y') in the img2. Then, you just copy the RGB in (x, y) to (x', y') in img2, and repeat this for all the coordinates in img1. Am I correct? Or, is there any efficient way of doing so? (like you've mentioned in #28 ?)

About ETH3D evaluation

Hi Wei,
thanks for sharing the code.

Would it be possible to provide the ETH3D evaluation code?
I was wondering how the data flow of the model's forward propagation.

Look forward to your reply.
Regards

About Image Size

Hi~
I want to know what the image size is recommended for single pair matching.
Thanks.

patch partition?

Thank you for such an excellent job. I have some questions about cotr. During the training process, do you divide the scene images into 256*256 patches according to certain rules after scaling and then input them into the network for training? (I'm not sure where this step is implemented in the program.) How is corrs partitioned? Will it be the case that the corresponding point is divided into the next patch? How should this be handled? Is the validation process also similar to the training process after the split iteration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.