GithubHelp home page GithubHelp logo

xlwangdev / hc-net Goto Github PK

View Code? Open in Web Editor NEW
43.0 4.0 3.0 10 MB

[NeurIPS 2023] Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

Python 99.55% Shell 0.45%
bev cross-view-image-geolocation inverse-perspective-mapping ipm localization remote-sensing robotics satellite-images

hc-net's Introduction

HC-Net: Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

🏠 About

image-20230831214545912

We introduce a novel approach to fine-grained cross-view geo-localization. Our method aligns a warped ground image with a corresponding GPS-tagged satellite image covering the same area using homography estimation. We first employ a differentiable spherical transform, adhering to geometric principles, to accurately align the perspective of the ground image with the satellite map. To address challenges such as occlusion, small overlapping range, and seasonal variations, we propose a robust correlation-aware homography estimator to align similar parts of the transformed ground image with the satellite image. Our method achieves sub-pixel resolution and meter-level GPS accuracy by mapping the center point of the transformed ground image to the satellite image using a homography matrix and determining the orientation of the ground camera using a point above the central axis. Operating at a speed of 30 FPS, our method outperforms state-of-the-art techniques, reducing the mean metric localization error by 21.3% and 32.4% in same-area and cross-area generalization tasks on the VIGOR benchmark, respectively, and by 34.4% on the KITTI benchmark in same-area evaluation.

πŸ”₯ News

  • [2024-04-27] We release the training codes for the VIGOR dataset.
  • [2023-10-27] We release the inferencing codes with checkpoints as well as the demo script. You can test HC-Net with your own machines.
  • [2023-10-01] We release the code for implementing the spherical transform. For usage instructions, please refer to Spherical_transform.ipynb.
  • [2023-09-21] HC-Net has been accepted by NeurIPS 2023! πŸ”₯πŸ”₯πŸ”₯
  • [2023-08-30] We release the paper of HC-Net and an online gradio demo.

πŸ€– Online Demo

HC-Net is online! Try it at this url.

Please use the demo script to deploy the demo locally for testing.

You can test our model using the data from the 'same_area_balanced_test.txt' split of the VIGOR dataset, or by providing your own Panorama image along with its corresponding Satellite image.

image-20230831204530724

πŸ“¦ Training and Evaluation

Installation

We train and test our codes under the following environment:

  • Ubuntu 18.04
  • CUDA 12.0
  • Python 3.8.16
  • PyTorch 1.13.0

To get started, follow these steps:

  1. Clone this repository.
git clone https://github.com/xlwangDev/HC-Net.git
cd HC-Net
  1. Install the required packages.
conda create -n hcnet python=3.8 -y
conda activate hcnet
pip install -r requirements.txt

Training

sh train.sh

Evaluation

To evaluate the HC-Net model, follow these steps:

  1. Download the VIGOR dataset and set its path to '/home/< usr >/Data/VIGOR'.
  2. Download the pretrained models and place them in the './checkpoints/VIGOR '.
  3. Run the following command:
chmod +x val.sh
# Usage: val.sh [same|cross]
# For same-area in VIGOR
./val.sh same 0
# For cross-area in VIGOR
./val.sh cross 0
  1. You can also observe the visualization results of the model through a demo based on gradio. Use the following command to start the demo, and open the local URL: http://0.0.0.0:7860.
python demo_gradio.py

🏷️ Label Correction for VIGOR Dataset

image-20230831204530724

We propose the use of Mercator projection to directly compute the pixel coordinates of ground images on specified satellite images using the GPS information provided in the dataset. You can find the specific code at Mercator.py.

To use our corrected label, you can add the following content to the __getitem__ method of the VIGORDataset class in datasets.py file in the CCVPE project:

from Mercator import *

pano_gps = np.array(self.grd_list[idx][:-5].split(',')[-2:]).astype(float)   
pano_gps = torch.from_numpy(pano_gps).unsqueeze(0) 

sat_gps = np.array(self.sat_list[self.label[idx][pos_index]][:-4].split('_')[-2:]).astype(float)
sat_gps = torch.from_numpy(sat_gps).unsqueeze(0)     

zoom = 20
y = get_pixel_tensor(sat_gps[:,0], sat_gps[:,1], pano_gps[:,0],pano_gps[:,1], zoom) 
col_offset_, row_offset_ = y[0], y[1]

width_raw, height_raw = sat.size
col_offset, row_offset = width_raw/2 -col_offset_.item(), row_offset_.item() - height_raw/2

πŸ“· Get BEV Image from front-view

We have released the code corresponding to section A.2 in the paper's Supplementary, along with an online testing platform .

Compared to traditional Inverse Perspective Mapping (IPM), our approach does not require calibration of camera parameters. Instead, it allows for manual tuning to achieve an acceptable BEV projection result.

You can use Hugging Face Spaces for online testing, or run our code locally. Online testing utilizes CPU for computation, which is slower. If you run it locally with a GPU, the projection process takes less than 10ms.

python demo_gradio_kitti.py

Our projection process is implemented entirely in PyTorch, which means our projection method is differentiable and can be directly deployed in any network for gradient propagation.

Example of KITTI

image-20230904150231834

Example of a Random Network Image

image-20230904150208550

πŸ“ TODO List

  • Add data preparation codes.
  • Add inferencing and serving codes with checkpoints.
  • Add evaluation codes.
  • Add training codes.

πŸ”— Citation

If you find our work helpful, please cite:

@article{wang2024fine,
  title={Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator},
  author={Wang, Xiaolong and Xu, Runsen and Cui, Zhuofan and Wan, Zeyu and Zhang, Yu},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

πŸ‘ Acknowledgements

  • This work is mainly based on IHN and RAFT, we thank the authors for the contribution.

hc-net's People

Contributors

xlwangdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hc-net's Issues

training codes

I kindly request access to the training code in your repository. Additionally, I would like to inquire about the expected release date for the complete code. Thank you for your attention.

Transform for CVUSA.

Hi Xiaolong,

I have reproduced the Spherical Transform in VIGOR. However, it isn't easy to get a normal result in CVUSA. Whether there are additional operations in CVUSA?

Best,
Guopeng.

complete code

Thank you for your great work! Don’t know when the complete code will be released?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.