Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Yes, it is done in <a href="https://github.com/cattaneod/CMRNet/blob/master/preprocess

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Yes, it is done in <a href="https://github.com/cattaneod/CMRNet/blob/mast

Using a different data set for training and evaluation about cmrnet HOT 12 CLOSED

cattaneod commented on August 24, 2024

Using a different data set for training and evaluation

from cmrnet.

Comments (12)

cattaneod commented on August 24, 2024 1

Yes, it is done in kitti_maps.py#L119

from cmrnet.

cattaneod commented on August 24, 2024

Hi @RaviBeagle ,

although CMRNet is scene agnostic, it's not camera agnostic, which means that it can only be deployed using the same camera sensor as used during training. To overcome this issue, we proposed CMRNet++ which is scene and sensor agnostic, but the code is not publicly available at the moment.

In order to train CMRNet on a different dataset, the first and most important step is to check the quality of the ground truth poses. If the GT poses are not accurate, you probably won't be able to train CMRNet.
Then, you should preprocess the dataset, by generating the map of every sequence, and generating a local map for each camera image, as done in https://github.com/cattaneod/CMRNet/blob/master/preprocess/kitti_maps.py. As mentioned in the README, the local maps should have this reference frame: X-forward, Y-right, Z-down.

After the dataset is preprocessed, you should adapt the data loader (https://github.com/cattaneod/CMRNet/blob/master/DatasetVisibilityKitti.py), make sure to change the camera calibration to your own calibration parameters (the images should also be undistorted beforehand).

Finally, you should change the image_size parameter to a suitable size for you images (take into account that both width and height should be multiple of 64, due to the architecture of the network).

from cmrnet.

RaviBeagle commented on August 24, 2024

Thanks a lot for the hints.
We are now preparing our vehicle setup to capture the dataset. The sensor setup is as follows:

Velodyne HDL-16 as our LiDAR for Localization. Driving area has already the point cloud maps generated with same sensor.
MYNTEYE S1030 Stereo camera. This camera provides grayscale images.

The plan is to run the ROS HDL localization package to generate the GT poses and capture the sensor data with approximate time synchronization. Do you see any limitations of our setup ?

from cmrnet.

cattaneod commented on August 24, 2024

I'm not familiar with the ROS HDL package, but if the generated GT poses are accurate enough, I don't see any big limitations.
CMRNet is intended for monocular localization, since you have a stereo setup, I can guess that you can further improve the localization performance by including the second camera.

from cmrnet.

RaviBeagle commented on August 24, 2024

Hello @cattaneod ,
Due to delays in setting up our vehicle and sensors, we plan in the meantime to test out a synthetic dataset from CARLA (https://github.com/jedeschaud/kitti_carla_simulator) to train. With some modifications to the scripts we hope to have a dataset similar to KITTI. The CMRNet would not need any modifications in that case, if I am right ?

Thanks

from cmrnet.

RaviBeagle commented on August 24, 2024

Hello @cattaneod ,

Have managed to generate KITTI type of dataset from the CARLA simulator. Here one question on a comment you put in the documentation:

"The Data Loader requires a local point cloud for each camera frame, the point cloud must be expressed with respect to the camera_2 reference frame, BUT (very important) with a different axes representation: X-forward, Y-right, Z-down."

Is this done by the preprocess/kitti_maps.py or is it something I have to do at the time of generation ?

from cmrnet.

RaviBeagle commented on August 24, 2024

Finally, you should change the image_size parameter to a suitable size for you images (take into account that both width and height should be multiple of 64, due to the architecture of the network).

I have set the RGB image size in CARLA as 1344x512 which is multiple of 64

Yet I get some error during training:

File "/home/sxv1kor/Temp/CMRNet/DatasetVisibilityKitti.py", line 180, in __getitem__
    img = self.custom_transform(img, img_rotation, h_mirror)
  File "/home/sxv1kor/Temp/CMRNet/DatasetVisibilityKitti.py", line 135, in custom_transform
    rgb = normalization(rgb)
  File "/home/sxv1kor/anaconda3/envs/cmrnet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sxv1kor/anaconda3/envs/cmrnet2/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 269, in forward
    return F.normalize(tensor, self.mean, self.std, self.inplace)
  File "/home/sxv1kor/anaconda3/envs/cmrnet2/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 360, in normalize
    return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace)
  File "/home/sxv1kor/anaconda3/envs/cmrnet2/lib/python3.7/site-packages/torchvision/transforms/functional_tensor.py", line 959, in normalize
    tensor.sub_(mean).div_(std)
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

from cmrnet.

cattaneod commented on August 24, 2024

As the error says, your images have 4 channels instead of 3 [R,G,B].

Sorry, I can't really help you with different datasets.

from cmrnet.

RaviBeagle commented on August 24, 2024

Yes, indeed that was the problem. Thanks.
I have saved the images as RGB and now the training is happening.

from cmrnet.

whu-lyh commented on August 24, 2024

Yes, it is done in kitti_maps.py#L119

Hi @cattaneod , Thanks for your amazing job. I wonder how you fetch the surrounding point cloud that forms the local submap? Specifically, I mean here. The code confuses me quite a lot.

from cmrnet.

cattaneod commented on August 24, 2024

Hi @cattaneod , Thanks for your amazing job. I wonder how you fetch the surrounding point cloud that forms the local submap? Specifically, I mean here. The code confuses me quite a lot.

What exactly is not clear?
To generate a local submap around the camera pose, I first transform the global map into a local map, such that the camera pose is the origin of the point cloud

local_map = torch.mm(pose, local_map).t()

Then, I crop the point cloud around the origin (which now is the camera pose). Specifically, i take 25 meters on the left and right, 100 meters in the front, and -10 in the back, to account for the random initial position H_init that could be behind the real pose.

indexes = local_map[:, 1] > -25.  # Crop 25 meters to the right
indexes = indexes & (local_map[:, 1] < 25.)  # Crop 25 meters to the left
indexes = indexes & (local_map[:, 0] > -10.)  # Crop 10 meters to the back
indexes = indexes & (local_map[:, 0] < 100.)  # Crop 100 meters to the front

from cmrnet.

whu-lyh commented on August 24, 2024

Ohhhhh, I figure it out. Thanks very much!

from cmrnet.

Using a different data set for training and evaluation about cmrnet HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs