Comments (5)
Hey, thanks for trying out this work. You're right, I'm not using the camera's intrinsic parameters. You can find the intrinsic parameters in the KITTI metadata (calib.txt files), as I did here: https://github.com/aofrancani/TSformer-VO/blob/main/datasets/kitti.py#L122. However, my only input is the RGB images. Therefore, with an end-to-end deep learning approach, we expect the network to learn all necessary parameters internally during the feature extraction step. Since I'm not using the Essential matrix to estimate the pose, I don't explicitly use those intrinsic parameters.
from tsformer-vo.
Hi @aofrancani Thanks!
Is it right to say that model will not generalize well on other cameras with different intrinsic parameters? Did you try it?
from tsformer-vo.
Yes, you are right about the generalization. I believe this is the major limitation of supervised deep learning methods in the context of visual odometry. Ideally, these methods require large-scale and diverse labeled data, encompassing different cameras, dynamic environments, and varying light and meteorological conditions such as rain, snow, direct sunlight, and night.
Unfortunately, VO datasets are not currently large enough to fully explore the great potential of the Transformer architecture in handling extensive data.
And no, I haven't explored the generalization on different configurations and datasets (at least not yet). Perhaps you could try mixing datasets with different calibrations and using the intrinsic parameters as additional input to the model. However, this approach may only partially mitigate the generalization challenge. Another strategy is to explore transfer learning techniques, but achieving optimal generalization remains a significant challenge in the field. As I have observed in recent surveys, researchers prefer to adopt hybrid approaches, using deep learning models only in certain components of visual odometry (e.g. feature extraction, matching, depth estimation) and still incorporating geometric constraints to estimate the pose. They also make use of additional sensors, such as the IMU in visual-inertial odometry (VIO) to increase the model's performance.
from tsformer-vo.
Thanks again! From what I've seen in other VO solutions is that the intrinsic parameters of the camera are being used to remove the augmentation/distortion of each frame/image before they are propagated into the model. Could it be useful (in term of generalization) to do the same with your E2E model?
from tsformer-vo.
oh, I see what you mean. To be honest, I don't know, I think it can be useful... I was wondering if we could use distortion for image augmentation to increase the data and help with generalization... it should work, but I think we will only know by trying and evaluating.
from tsformer-vo.
Related Issues (17)
- input diemension HOT 2
- Dataset indexing issues HOT 1
- 06 sequence HOT 2
- GPU HOT 2
- An error occurs in pretrained_ViT: True HOT 6
- window_sizw HOT 8
- To achieve results more closely aligned with the paper HOT 2
- Using pre trained models. HOT 3
- Reference Frame of the predicted poses
- Trajectory drawing HOT 1
- modify training data ? HOT 1
- Can I provide the weights of the trained model? HOT 4
- result HOT 4
- 7DoF and 6DoF HOT 9
- Thank you so much HOT 1
- How can get the value of undo normalization step in plot_results.py HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tsformer-vo.