GithubHelp home page GithubHelp logo

parskatt / roma Goto Github PK

View Code? Open in Web Editor NEW
460.0 13.0 36.0 14.76 MB

[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.

Home Page: https://parskatt.github.io/RoMa/

License: MIT License

Python 100.00%
3d-reconstruction dense-matching feature-matching image-matching

roma's Introduction

RoMa 🏛️:
Robust Dense Feature Matching
⭐CVPR 2024⭐

Johan Edstedt · Qiyu Sun · Georg Bökman · Mårten Wadenbäck · Michael Felsberg


example
RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.

Setup/Install

In your python environment (tested on Linux python 3.10), run:

pip install -e .

Demo / How to Use

We provide two demos in the demos folder. Here's the gist of it:

from roma import roma_outdoor
roma_model = roma_outdoor(device=device)
# Match
warp, certainty = roma_model.match(imA_path, imB_path, device=device)
# Sample matches for estimation
matches, certainty = roma_model.sample(warp, certainty)
# Convert to pixel coordinates (RoMa produces matches in [-1,1]x[-1,1])
kptsA, kptsB = roma_model.to_pixel_coordinates(matches, H_A, W_A, H_B, W_B)
# Find a fundamental matrix (or anything else of interest)
F, mask = cv2.findFundamentalMat(
    kptsA.cpu().numpy(), kptsB.cpu().numpy(), ransacReprojThreshold=0.2, method=cv2.USAC_MAGSAC, confidence=0.999999, maxIters=10000
)

New: You can also match arbitrary keypoints with RoMa. See match_keypoints in RegressionMatcher.

Settings

Resolution

By default RoMa uses an initial resolution of (560,560) which is then upsampled to (864,864). You can change this at construction (see roma_outdoor kwargs). You can also change this later, by changing the roma_model.w_resized, roma_model.h_resized, and roma_model.upsample_res.

Sampling

roma_model.sample_thresh controls the thresholding used when sampling matches for estimation. In certain cases a lower or higher threshold may improve results.

Reproducing Results

The experiments in the paper are provided in the experiments folder.

Training

  1. First follow the instructions provided here: https://github.com/Parskatt/DKM for downloading and preprocessing datasets.
  2. Run the relevant experiment, e.g.,
torchrun --nproc_per_node=4 --nnodes=1 --rdzv_backend=c10d experiments/roma_outdoor.py

Testing

python experiments/roma_outdoor.py --only_test --benchmark mega-1500

License

All our code except DINOv2 is MIT license. DINOv2 has an Apache 2 license DINOv2.

Acknowledgement

Our codebase builds on the code in DKM.

Tiny RoMa

If you find that RoMa is too heavy, you might want to try Tiny RoMa which is built on top of XFeat.

from roma import tiny_roma_v1_outdoor
tiny_roma_model = tiny_roma_v1_outdoor(device=device)

Mega1500:

AUC@5 AUC@10 AUC@20
XFeat 46.4 58.9 69.2
XFeat* 51.9 67.2 78.9
Tiny RoMa v1 56.4 69.5 79.5
RoMa - - -

Mega-8-Scenes (See DKM):

AUC@5 AUC@10 AUC@20
XFeat - - -
XFeat* 50.1 64.4 75.2
Tiny RoMa v1 57.7 70.5 79.6
RoMa - - -

IMC22 :'):

mAA@10
XFeat 42.1
XFeat* -
Tiny RoMa v1 42.2
RoMa -

BibTeX

If you find our models useful, please consider citing our paper!

@article{edstedt2024roma,
title={{RoMa: Robust Dense Feature Matching}},
author={Edstedt, Johan and Sun, Qiyu and Bökman, Georg and Wadenbäck, Mårten and Felsberg, Michael},
journal={IEEE Conference on Computer Vision and Pattern Recognition},
year={2024}
}

roma's People

Contributors

dawars avatar dgcnz avatar lnexenl avatar parskatt avatar qkqhd222 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

roma's Issues

Attempted to integrate into hloc

Tried to get hloc to use RoMa as a matcher: https://github.com/catid/hloc

The results were not good:

My Aachen Day-Night v1.1 Submissions
Method | day | night
hloc defaults | 88.5 / 95.5 / 98.5 | - / - / -
roma | 87.1 / 94.8 / 97.7 | 66.5 / 89.0 / 96.3

Any suggestions for how to get good performance with RoMa for matching?

Training Plot

Hello!
I try to training roma myself, i wonder if you can upload your training plot.
In addition the training process contains several loss(delta_regression_loss_1,delta_certainty_loss_16, delta_certainty_loss_4,gm_cls_loss_16..) and i am not sure on which output should I focus.
Also for how long the model had been train? after 250000 i was able to achieve auc 0.58 @5 on MegaDepth-1500.

would you please share you log?

Thank you for your wonderful work. Could you please share your training logs? Another question is, if I want to train your network with a general dataset (the original training dataset is too large), how can I modify it? Looking forward to your answer.

Trying different backbone ?

Hello, your results are very promising!

I wonder if you've tried other backbones instead of DinoV2. Indeed, I wonder how much of the good results you're getting come from the extremely well pre-trained DinoV2 backbone or from the loss function you've developed.

Sharing the checkpoint

Hi there! Thanks for your great work! Do you have any plan to release the checkpoint in the near future?

Load indoor model?

from roma import roma_indoor
xFormers not available
xFormers not available

I get this error message.

about dataset

Great work. What puzzles me is why you don't employ the entire MegaDepth dataset, but rather only utilize data from two specific scenes。

Running the demo is not working

I'm trying to run the demo_match.py inside a ubuntu20 docker image.
I did the setup and I'm getting the error:

Screenshot from 2024-06-20 09-40-10

In what environment did you test it ?
Do you have image with specific python libraries versions ?

Unexpected floating ScalarType in at::autocast::prioritize

I was just trying out the demo_fundamental.py demo and ran into the following error:

Unexpected floating ScalarType in at::autocast::prioritize

It occurs in the grid_sampler here (local_correlation.py, line 40)

local_window_coords = (coords[_,:,:,None]+local_window[:,None,None]).reshape(1,h,w*(2*r+1)**2,2).half()
window_feature = F.grid_sample(
                feature1[_:_+1], local_window_coords, padding_mode=padding_mode, align_corners=False, mode = sample_mode,)

It can be solved by removing the half precision of local_window_coords and converting feature1 to float(). However, I am not sure about any implications that might have.

local_window_coords = (coords[_,:,:,None]+local_window[:,None,None]).reshape(1,h,w*(2*r+1)**2,2)
window_feature = F.grid_sample(
                feature1[_:_+1].float(), local_window_coords, padding_mode=padding_mode, align_corners=False, mode = sample_mode,)

System/Env:

  • Python 3.10
  • Pytorch 2.0.1
  • Cuda 11.7
  • RTX 3090

Should KDE performs on the two images separately?

Thanks for the great method which almost solved the WxBS problem.

I noticed symmetric matching is the default.

symmetric = True

and KDE is performed on all sampled matches from (H, 2*W) warp results:

RoMa/roma/models/matcher.py

Lines 475 to 486 in 5052229

matches, certainty = (
matches.reshape(-1, 4),
certainty.reshape(-1),
)
expansion_factor = 4 if "balanced" in self.sample_mode else 1
good_samples = torch.multinomial(certainty,
num_samples = min(expansion_factor*num, len(certainty)),
replacement=False)
good_matches, good_certainty = matches[good_samples], certainty[good_samples]
if "balanced" not in self.sample_mode:
return good_matches, good_certainty
density = kde(good_matches, std=0.1)

The positions and warping of image1 and image2 should be independent.
So, should KDE performs on the two images separately?

Another question:
Since, the sampling is from symmetric matching, will the results contain many near duplicated matches?

About training settings

Hi Johan,

I really appreciate your great work RoMa. I found that the training strategy of RoMa (i.e., scheduler) is different from DKM. Does using different training strategies help with performance?

Thank you so much for your help!

How does T_1to2 generated?

I am doing some epipolar work, however, I found that rotational part of T_1to2 differs much from what I solved from matching pairs. And sampson distance calculated from T_1to2 and matching pairs is also huge.

So I am curious about how does T_1to2 generated? What does T_1to2 mean?

small model backbone

Hi, I've modified the code to load a DinoV2 small model, but i realized that the embed_dim of the vit_small model

def vit_small(patch_size=16, **kwargs):
    model = DinoVisionTransformer(
        patch_size=patch_size,
        embed_dim=384,
        depth=12,
        num_heads=6,
        mlp_ratio=4,
        block_fn=partial(Block, attn_class=MemEffAttention),
        **kwargs,
    )
    return model

is 384, which is causing dimension mismatch problem with the RoMa ckpt provided. which assumes that the embed_dim is 1024. e.g.

 proj16 = nn.Sequential(nn.Conv2d(1024, 512, 1, 1), nn.BatchNorm2d(512))

can you provide the weights for RoMa-s? As the model takes ~6GB vram even after applying the change from kde to approx_kde in #23, so being able to use a small model would help a lot

About nan and inf grads

I found that when preforming backward, there sometimes exists warnings like:
image

do these nan or inf grads have bad effects on training?

Dino Version

hello!
I wonder if there is additional version of dino (for example ViT-S/14) that can be used as backbone in this model

Finding matches with arbitrary query points

Hi @Parskatt ,

Any ETA on when the demo for matching arbitrary keypoints will be released? The README says that it is possible and the demo will be released soon.

Is there any function in the current codebase that can be directly used to match arbitrary query points? If yes, I would be thankful for a pointer to the same.

Thanks!

Yours sincerely,
Aditya

Value of `coarse_res` and `upsample_res` for ScanNet

Hi @Parskatt ,

I am trying to run an evaluation of roma_indoor on ScanNet, like what you have used for roma_outdoor in RoMa/experiments/eval_roma_outdoor.py. Could you please tell me what values of coarse_res and upsample_res to use with ScanNet when initializing the model?

Thanks!

Regards,
Aditya

VGG19

hello!
i might missing something, but why the VGG19 pretrained value is false

out of cuda memory

Hi,
Thanks for sharing your excellent work!.
I have an 11GB cuda memory card and run out of memory at the local_correlation function. Is there a way to restrict the cuda memory size other than reducing the image size?

inference code for windows system

Hi, thanks for your outstanding work, I would like to ask if there is an inference code for windows operating system, and if configuring linux environment on wins system affects the inference speed?

License

Thank you for the work! Could you please change the LICENSE file acording to the description in README file?

Recent update greatly increased GPU memory usage

Matching 2 images with 20000 matches works with commit 69cefb1
and doesn't with currentmain.

I guess that is because of removing

    torch.backends.cuda.matmul.allow_tf32 = True # allow tf32 on matmul TODO: these probably ruin stuff, should be careful
    torch.backends.cudnn.allow_tf32 = True # allow tf32 on cudnn

Failure to generate 3D reconstruction with unknown pose of the cameras

Hi Roma contributors,

I find RoMa model works extremely well on my dataset, as evidenced by the graphs I have drawn according to the points RoMa matched together. Yet it seems to be quite incompatible with pycolmap.incremental_mapping() for some unknown reasons. For a 55-images image set, I keep getting only two camera registered while producing very small reprojection_error. My primary aim is to estimate the poses of the cameras that generate the images.

The issue is a little complicated and I understand that it may not be solved with this little context. So, if you would like to help, I can send you my code for further inspection. Thank you a lot for working out this brilliant model!

Best regards,

Roy

Pairing Of Any Custom Points

You guys are doing a great job, I noticed you said you will support pairing of any custom points, when can I get a demo out please!

doplegangers

I was wondering if you had any suggestions on how to deal with doplegangers? Is RoMa trained at all to try to differentiate them?

batch size

hello
i noticed that when i used different batch size the certainty value is changing
did you any idea why?

onnx

there is any plan to release onnx support?

Release some test dataset to evaluate

Hi, Authors,

The Mega_depth dataaset is too large to download, could you please release some bechmark data for us to do evaluation test?

Thank you.

Training code in `roma_indoor.py` seems to use non-distributed sampler with DDP

Hi,

I was going through the training code in experiments/roma_indoor.py and it seems that you have used a non-distributed sampler (WeightedRandomSampler) instead of DistributedSampler. I believe this means that the entire data will be replicated and passed to each model replica instead of shards of the data. Just wanted to confirm this and ask if this is this is intentional?

Thanks!

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half

    return super().forward(x_or_x_list)
  File "/Users/rkothari/Documents/Projects/compalg_horizon/src/extern/RoMa/roma/models/transformer/layers/block.py", line 105, in forward
    x = x + attn_residual_func(x)
  File "/Users/rkothari/Documents/Projects/compalg_horizon/src/extern/RoMa/roma/models/transformer/layers/block.py", line 84, in attn_residual_func
    return self.ls1(self.attn(self.norm1(x)))
  File "/Users/rkothari/anaconda3/envs/horizon/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/rkothari/anaconda3/envs/horizon/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/rkothari/anaconda3/envs/horizon/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 201, in forward
    return F.layer_norm(
  File "/Users/rkothari/anaconda3/envs/horizon/lib/python3.10/site-packages/torch/nn/functional.py", line 2546, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half

Where is this error coming from?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.