GithubHelp home page GithubHelp logo

bit-mjy / cvtnet Goto Github PK

View Code? Open in Web Editor NEW
98.0 3.0 8.0 17.54 MB

[TII 2023] A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments.

License: MIT License

Python 88.62% CMake 1.50% C++ 9.88%
global-localization lidar-place-recognition loop-closure-detection multi-view place-recognition slam

cvtnet's Introduction

CVTNet

The code for our paper accepted by IEEE Transactions on Industrial Informatics:

CVTNet: A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments.

[IEEE Xplore TII 2023] [arXiv] [Supplementary Materials]

Junyi Ma, Guangming Xiong, Jingyi Xu, Xieyuanli Chen*

CVTNet fuses the range image views (RIVs) and bird's eye views (BEVs) generated from LiDAR data to recognize previously visited places. RIVs and BEVs have the same shift for each yaw-angle rotation, which can be used to extract aligned features.

Table of Contents

  1. Publication
  2. Dependencies
  3. How to Use
  4. TODO
  5. Related Work
  6. License

Publication

If you use the code in your work, please cite our paper:

@ARTICLE{10273716,
  author={Ma, Junyi and Xiong, Guangming and Xu, Jingyi and Chen, Xieyuanli},
  journal={IEEE Transactions on Industrial Informatics}, 
  title={CVTNet: A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments}, 
  year={2023},
  doi={10.1109/TII.2023.3313635}}

Dependencies

Please refer to our SeqOT repo.

How to Use

[2024-07] We thank Xiongwei Zhao for helping release the utilization on KITTI dataset!

[2023-03] We provide a training and test tutorial for NCLT sequences in this repository. Before any operation, please modify the config file according to your setups.

Data Preparation

1. data preparation for NCLT dataset:

You need to generate RIVs and BEVs from raw LiDAR data by

cd tools
python gen_ri_bev.py

2. data preparation for KITTI dataset:

2.1 You need to generate RIVs and BEVs from raw LiDAR data for train datasets and test datasets by

cd tools
python gen_ri_bev.py 

2.2 You need to generate training index for kitti from raw LiDAR data for train datasets by

cd tools
python gen_training_index_kitti.py

2.3 You need to generate ground_truth for kitti by

cd tools
python gen_ground_truth_kitti.py

Training

You can start the training process with

cd train
python ./train_cvtnet.py

Note that we only train our model using the oldest sequence of NCLT dataset (2012-01-08), to prove that our model works well for long time spans even if seeing limited data.

Test

You can test the PR performance of CVTNet by

cd test
python ./test_cvtnet_prepare.py
python ./cal_topn_recall.py

You can also test the yaw-rotation invariance of CVTNet by

cd test
python ./test_yaw_rotation_invariance.py

It can be seen that the global descriptors generated by CVTNet are not affected by yaw-angle rotation.

C++ Implementation

We provide a toy example showing C++ implementation of CVTNet with libtorch. First, you need to generate the model file by

cd CVTNet_libtorch
python ./gen_libtorch_model.py

Then you can generate a descriptor of the provided 1.pcd by

cd ws
mkdir build
cd build
cmake ..
make -j6
./fast_cvtnet

TODO

  • Release the preprocessing code and pretrained model
  • Release sequence-enhanced CVTNet (SeqCVT)

Related Work

Thanks for your interest in our previous OT series for LiDAR-based place recognition.

  • OverlapNet: Loop Closing for 3D LiDAR-based SLAM
@inproceedings{chen2020rss, 
  author = {X. Chen and T. L\"abe and A. Milioto and T. R\"ohling and O. Vysotska and A. Haag and J. Behley and C. Stachniss},
  title  = {{OverlapNet: Loop Closing for LiDAR-based SLAM}},
  booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
  year = {2020}
}
  • OverlapTransformer: An Efficient and Yaw-Angle-Invariant Transformer Network for LiDAR-Based Place Recognition
@ARTICLE{ma2022ral,
  author={Ma, Junyi and Zhang, Jun and Xu, Jintao and Ai, Rui and Gu, Weihao and Chen, Xieyuanli},
  journal={IEEE Robotics and Automation Letters}, 
  title={OverlapTransformer: An Efficient and Yaw-Angle-Invariant Transformer Network for LiDAR-Based Place Recognition}, 
  year={2022},
  volume={7},
  number={3},
  pages={6958-6965},
  doi={10.1109/LRA.2022.3178797}}
  • SeqOT: A Spatial-Temporal Transformer Network for Place Recognition Using Sequential LiDAR Data
@ARTICLE{ma2022tie,
  author={Ma, Junyi and Chen, Xieyuanli and Xu, Jingyi and Xiong, Guangming},
  journal={IEEE Transactions on Industrial Electronics}, 
  title={SeqOT: A Spatial-Temporal Transformer Network for Place Recognition Using Sequential LiDAR Data}, 
  year={2022},
  doi={10.1109/TIE.2022.3229385}}

License

Copyright 2023, Junyi Ma, Guangming Xiong, Jingyi Xu, Xieyuanli Chen, Beijing Institute of Technology.

This project is free software made available under the MIT License. For more details see the LICENSE file.

cvtnet's People

Contributors

bit-mjy avatar grandzxw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cvtnet's Issues

KITTI Dataset Configuration

Is it possible to provide a configuration for this issue on the KITTI dataset? This is very important for my experiments. Thank you very much for your open-source work.

SeqCVT release

Really nice contribution. CVTNet has already performed very well while querying places. So I want to further evaluate the more advanced SeqCVT. May I know when the SeqCVT will be released?

关于训练的batch_size问题

很抱歉在您百忙中打扰您,想问您一下,在训练的过程中,如何更改参数使得batch_size更大,让训练的速度提高一些,是在配置文件中更改读取一次正类或者一次负类的数量吗?即num_pos与num_neg的大小吗?辛苦您您百忙中看到回复一下,万分感谢!

Question about Generating Ground Truth Files

Hello,

I apologize for the inconvenience, but I still have questions about how to generate the Ground Truth files (e.g., gt_120108_120205.npy). I couldn't find the relevant code for this in the pole-localization repository (https://github.com/PRBonn/pole-localization) you mentioned earlier.

Could you please guide me on how to generate these files or provide alternative Ground Truth files like gt_120108_120615.npy, gt_120108_130223.npy, etc.?

Thank you for your assistance!

about bev_projection.

I checked the utils.bev_projection, you hard code a kitti_lidar_height = 2.0
does this matter for working with my own data?

What threshold to keep positive place recognitions

Dear Authors,

Thank you for the great work and the open sourcing effort.

We tried to use the descriptors computed by overlaptransformer in our SLAM pipeline to find position loop closures. The knn in faiss always returns the top match (and its distance) for a descriptor, apparently there are a lot of false positives. So the problem is how to discard the bad ones and only keep the good ones. Do we use a fixed threshold for this purpose?

But when we look at the distances computed in the example tests, we find that the distribution of distances of positive matches and the distribution of distances of negative matches largely overlap.
That is, a fixed threshold does not work in this case. So I wonder what criteria we should use to tell positive place recognitions given the descriptors?

Regards,

Quesions about ri_bev_generation

Hello, thanks for your great work and open source contributions!

I'd like to try CVTNet on the KITTI dataset. And config.yaml is required to be modified properly. However, I encountered some confusion, could you give me some help?

ri_bev_generation:
# path of source .bin files
source_scans_root: "/media/mjy/Samsung_T5/NCLT_dataset/velodyne_data/2012-01-08_vel/velodyne_sync/"
# path of target .npy files including RIVs and BEVs
target_ri_bev_root: "/media/mjy/Samsung_T5/NCLT_dataset/velodyne_data/2012-01-08_vel/ri_bev/"
# upper bound of vertical fov
fov_up: 30.67
# lower bound of vertical fov
fov_down: -10.67
# height of RIVs and BEVs
proj_H: 32
# width of RIVs and BEVs
proj_W: 900
# range thresholds to generate multi-layer inputs
range_th: [0, 15, 30, 45, 60]
# height thresholds to generate multi-layer inputs
height_th: [-4, 0, 4, 8, 12]

  1. How to calculate the fov of LiDAR? It seems to be not mentioned in the NCLT dataset article. According to my understanding, it could be calculated as follows:
def calc_velo_fov(velo_xyz, ind):
    hori_dist = np.linalg.norm(velo_xyz[ind, :2])
    vertical = velo_xyz[ind, 2]
    fov_deg = np.degrees(np.arctan(vertical / hori_dist))
    return fov_deg

z_max_ind = np.argmax(velo_xyz[:, 2])
fov_up = calc_velo_fov(velo_xyz, z_max_ind)
z_min_ind = np.argmin(velo_xyz[:, 2])
fov_down = calc_velo_fov(velo_xyz, z_min_ind)
print("fov_up: ", fov_up)
print("fov_down: ", fov_down)
## fov_up:  19.87
## fov_down:  -1.59

But the result is quite different from yours. Is there anything wrong?

  1. Are there any factors or restrictions to take into consideration when designing proj_H, proj_W?

  2. range_th, height_th could be decided according to the histogram of range and height. Make sure the main parts are included, is that ok?

Could you share your calculation process or main ideas on them?

Looking forward, thanks!

关于测试自己的数据集

作者大大您好,很抱歉在您百忙中打扰您,我训练了您给的模型之后,得到的权重如何能够测试自己采的雷达数据?对于自己采的雷达数据有没有哪些特殊的参数需求?期待并万分感谢您的阅读和回答!

KITTI相关文件

请问有提供在KITTI数据集上测试的相关文件吗?我自己测试召回率特别低,想了解一下原因。

Code error in cvtnet.py

Hello, thanks for your great work. I'm inspired by the design of rotation invariance.

However, there seems to be a small mistake when organizing code to publish in CVTNet/modules/cvtnet.py.

CVTNet/modules/cvtnet.py

Lines 265 to 270 in 23c5b28

feature_bev = feature_bev.permute(0, 2, 1)
feature_bev = feature_bev.unsqueeze(-1)
feature_bev_enhanced = self.net_vlad_ri(feature_bev)
feature_bev_enhanced = F.normalize(feature_bev_enhanced, dim=1)
feature_com = torch.cat((feature_ri_enhanced, feature_com), dim=1)
feature_com = torch.cat((feature_com, feature_bev_enhanced), dim=1)

In Line-267:

feature_bev_enhanced = self.net_vlad_ri(feature_bev)

is supposed to be

feature_bev_enhanced = self.net_vlad_bev(feature_bev)

Best regards

xyz = xyz * 0.005 - 100.0

你好,请问以下代码中xyz = xyz * 0.005 - 100.0的目的是什么呢?

def data2xyzi(data, flip=True):
xyzil = data.view(velodatatype)
xyz = np.hstack(
[xyzil[axis].reshape([-1, 1]) for axis in ['x', 'y', 'z']])
xyz = xyz * 0.005 - 100.0

if flip:
    R = np.eye(3)
    R[2, 2] = -1
    xyz = np.matmul(xyz, R)
return xyz, xyzil['i']

how to calculate overlap on NCLT dataset

Hello, sorry for bothering you again: )

Alhough you have provided the gt_files, I'm trying to calculate the overlaps on the NCLT dataset by myself, as some changes might be based on overlaps. However, my overlap results are wrong (with max_overlap=0, min_overlap=0).

I have read the code to calculate overlaps in OverlapNet in Line57-77, which is calculated on the KITTI dataset. For convenience, the related code is copied as follows:

  # load scan paths
  scan_paths = load_files(scan_folder)

  # load calibrations
  T_cam_velo = load_calib(calib_file)
  T_cam_velo = np.asarray(T_cam_velo).reshape((4, 4))
  T_velo_cam = np.linalg.inv(T_cam_velo)

  # load poses
  poses = load_poses(poses_file)
  pose0_inv = np.linalg.inv(poses[0])

  # for KITTI dataset, we need to convert the provided poses 
  # from the camera coordinate system into the LiDAR coordinate system  
  poses_new = []
  for pose in poses:
    poses_new.append(T_velo_cam.dot(pose0_inv).dot(pose).dot(T_cam_velo))
  poses = np.array(poses_new)

  # generate overlap and yaw ground truth array
  ground_truth_mapping = com_overlap_yaw(scan_paths, poses, frame_idx=0)

According to my understanding,

  • T_cam_velo read from calib.txt, represents the coordinate transformation from velodyne to camera, namely $T_{cam}^{velo}$.
  • poses read from poses.txt, represents the poses of left camera.
  • To transform the poses into LIDAR coordinate system:

$$ Pose_{velo} = T_{velo}^{cam} \times P_0.inv() \times Pose_{cam} \times T_{cam}^{velo} $$

Although I do not understand the physical meaning of the above formula;(, I just imitate its relationships in the NCLT dataset, I write code as follows:

x_body_vel = [0.002, -0.004, -0.957, 0.807, 0.166, -90.703]
T_velo2body = ssc_to_homo(x_body_vel)
T_body2velo = np.linalg.inv(T_body2velo)

poses = load_body_poses_nclt(dataset_path, seq)  # poses of body, [N,4,4]
pose0_inv = np.linalg.inv(poses[0])

poses_new = []
for pose in tqdm(poses):
    poses_new.append(T_body2velo.dot(pose0_inv).dot(pose).dot(T_velo2body))
poses = np.array(poses_new)
  • x_body_vel = [0.002, -0.004, -0.957, 0.807, 0.166, -90.703] read from NCLT paper Table-4.
  • ssc_to_homo is copied from NCLT-devkit.
  • T_body2velo is the coordinate transformation from body to velodyne.
  • poses read from groundtruth_xxx.csv, represents the poses of body coordinate system.
  • So, cam part in KITTI should be replaced by body in the NCLT dataset.

What's wrong with the above code, could you give me some help or share your process code?
Looking forward, thanks!

关于NCLT数据集中点云数据velodyne_sync和groundtruth.csv时间戳无法对齐的问题

尊敬的作者
您好!
最近拜读了您的文章,感觉idea非常厉害!目前我在follow您工作的时候遇到了一个小问题,如果您能为我解决我将非常感谢!

  1. 在NCLT数据集中,以2012-01-08为例子,在velodyne_sync/中总共有28127帧点云,然而NCLT所给的groundtruth_2012_01-08.csv中有835469条数据,这是由于gps为100hz,而点云只有不到10hz造成的。在我进一步下载NCLT官网的https://s3.us-east-2.amazonaws.com/nclt.perl.engin.umich.edu/sensor_data/2012-01-08_sen.tar.gz后我发现其中有 odometry_mu.csv, 和点云帧数相对应具有28127项,但我翻阅NCLT官网的论文后发现,此odometry_mu.csv的数据格式代表(官网原始语句)contain 6-DOF odometry measurements synchronized with each image event as described in Section 4. This is calculated relative to the previous image event and follows the same format as above. 我的理解是odometry_mu.csv代表了每帧点云(与图像同步)之间的变化值。但是我将此csv的逐行相加后发现数值并不正确。我看到您的工作也使用了NCLT数据集,同时针对每帧点云生成了对应的Range image,想请问您如何获取每帧点云所在的GPS精确位置呢?以便于未来回环检测的评估工作。
    如果您能抽空解答我的疑惑,我将非常感激!谢谢!

NLCT dataset

Where could I get the Ground Truth files for the other dates in NLCT?

关于Ground Truth的生成问题以及召回率问题

您好!贵组的工作非常扎实,给了我许多新的思路同时也产生了一些问题。请问诸如gt_120108_120205.npy这种真值是如何生成的?还有我在计算cal_topn_recall的时候 返回的数值非常小 [0.0, 0.0253, 0.0262, 0.0268, 0.0283, 0.0296, 0.0304, 0.032, 0.0326, 0.0337, 0.0347, 0.0356, 0.0365, 0.0369, 0.0384, 0.0384, 0.0388, 0.039, 0.0399, 0.0405] 这正常吗?
期待您的回复。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.