Hi, Thanks for the great work. It seems that you are not using any of the intrinsi

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

camera intrinsic parameters about tsformer-vo HOT 5 CLOSED

aofrancani commented on July 30, 2024

camera intrinsic parameters

from tsformer-vo.

Comments (5)

aofrancani commented on July 30, 2024

Hey, thanks for trying out this work. You're right, I'm not using the camera's intrinsic parameters. You can find the intrinsic parameters in the KITTI metadata (calib.txt files), as I did here: https://github.com/aofrancani/TSformer-VO/blob/main/datasets/kitti.py#L122. However, my only input is the RGB images. Therefore, with an end-to-end deep learning approach, we expect the network to learn all necessary parameters internally during the feature extraction step. Since I'm not using the Essential matrix to estimate the pose, I don't explicitly use those intrinsic parameters.

from tsformer-vo.

spokV commented on July 30, 2024

Hi @aofrancani Thanks!
Is it right to say that model will not generalize well on other cameras with different intrinsic parameters? Did you try it?

from tsformer-vo.

aofrancani commented on July 30, 2024

Yes, you are right about the generalization. I believe this is the major limitation of supervised deep learning methods in the context of visual odometry. Ideally, these methods require large-scale and diverse labeled data, encompassing different cameras, dynamic environments, and varying light and meteorological conditions such as rain, snow, direct sunlight, and night.
Unfortunately, VO datasets are not currently large enough to fully explore the great potential of the Transformer architecture in handling extensive data.

And no, I haven't explored the generalization on different configurations and datasets (at least not yet). Perhaps you could try mixing datasets with different calibrations and using the intrinsic parameters as additional input to the model. However, this approach may only partially mitigate the generalization challenge. Another strategy is to explore transfer learning techniques, but achieving optimal generalization remains a significant challenge in the field. As I have observed in recent surveys, researchers prefer to adopt hybrid approaches, using deep learning models only in certain components of visual odometry (e.g. feature extraction, matching, depth estimation) and still incorporating geometric constraints to estimate the pose. They also make use of additional sensors, such as the IMU in visual-inertial odometry (VIO) to increase the model's performance.

from tsformer-vo.

spokV commented on July 30, 2024

Thanks again! From what I've seen in other VO solutions is that the intrinsic parameters of the camera are being used to remove the augmentation/distortion of each frame/image before they are propagated into the model. Could it be useful (in term of generalization) to do the same with your E2E model?

from tsformer-vo.

aofrancani commented on July 30, 2024

oh, I see what you mean. To be honest, I don't know, I think it can be useful... I was wondering if we could use distortion for image augmentation to increase the data and help with generalization... it should work, but I think we will only know by trying and evaluating.

from tsformer-vo.

camera intrinsic parameters about tsformer-vo HOT 5 CLOSED

Comments (5)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs