GithubHelp home page GithubHelp logo

Comments (5)

aofrancani avatar aofrancani commented on July 30, 2024

Hey, thanks for trying out this work. You're right, I'm not using the camera's intrinsic parameters. You can find the intrinsic parameters in the KITTI metadata (calib.txt files), as I did here: https://github.com/aofrancani/TSformer-VO/blob/main/datasets/kitti.py#L122. However, my only input is the RGB images. Therefore, with an end-to-end deep learning approach, we expect the network to learn all necessary parameters internally during the feature extraction step. Since I'm not using the Essential matrix to estimate the pose, I don't explicitly use those intrinsic parameters.

from tsformer-vo.

spokV avatar spokV commented on July 30, 2024

Hi @aofrancani Thanks!
Is it right to say that model will not generalize well on other cameras with different intrinsic parameters? Did you try it?

from tsformer-vo.

aofrancani avatar aofrancani commented on July 30, 2024

Yes, you are right about the generalization. I believe this is the major limitation of supervised deep learning methods in the context of visual odometry. Ideally, these methods require large-scale and diverse labeled data, encompassing different cameras, dynamic environments, and varying light and meteorological conditions such as rain, snow, direct sunlight, and night.
Unfortunately, VO datasets are not currently large enough to fully explore the great potential of the Transformer architecture in handling extensive data.

And no, I haven't explored the generalization on different configurations and datasets (at least not yet). Perhaps you could try mixing datasets with different calibrations and using the intrinsic parameters as additional input to the model. However, this approach may only partially mitigate the generalization challenge. Another strategy is to explore transfer learning techniques, but achieving optimal generalization remains a significant challenge in the field. As I have observed in recent surveys, researchers prefer to adopt hybrid approaches, using deep learning models only in certain components of visual odometry (e.g. feature extraction, matching, depth estimation) and still incorporating geometric constraints to estimate the pose. They also make use of additional sensors, such as the IMU in visual-inertial odometry (VIO) to increase the model's performance.

from tsformer-vo.

spokV avatar spokV commented on July 30, 2024

Thanks again! From what I've seen in other VO solutions is that the intrinsic parameters of the camera are being used to remove the augmentation/distortion of each frame/image before they are propagated into the model. Could it be useful (in term of generalization) to do the same with your E2E model?

from tsformer-vo.

aofrancani avatar aofrancani commented on July 30, 2024

oh, I see what you mean. To be honest, I don't know, I think it can be useful... I was wondering if we could use distortion for image augmentation to increase the data and help with generalization... it should work, but I think we will only know by trying and evaluating.

from tsformer-vo.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.