GithubHelp home page GithubHelp logo

aofrancani / tsformer-vo Goto Github PK

View Code? Open in Web Editor NEW
57.0 2.0 8.0 306 KB

Implementation of the paper "Transformer-based model for monocular visual odometry: a video understanding approach".

Home Page: https://arxiv.org/abs/2305.06121

License: MIT License

Python 100.00%
deep-learning monocular-visual-odometry transformer-models visual-odometry visual-slam

tsformer-vo's People

Contributors

aofrancani avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tsformer-vo's Issues

7DoF and 6DoF

How to visualize the trajectory of 7DoF and 6DoF respectively? Thank you if you can tell me on the code

An error occurs in pretrained_ViT: True

Thank you for sharing your great work.

When setting pretrained_ViT: True in args of train.py, the following error occurs. I confirmed that the ViT model was successfully downloaded. Can you tell me how to solve it?

Building model...
--- loading pretrained to start training ---
https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth

Downloading: "https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth" to /home/dmsai3/.cache/torch/hub/checkpoints/deit_small_patch16_224-cd65a155.pth
Traceback (most recent call last):
File "train.py", line 239, in
model, args = build_model(args, model_params)
File "/home/dmsai3/TSformer-VO/build_model.py", line 97, in build_model
load_pretrained(model, num_classes=model_params["num_classes"],
File "/home/dmsai3/TSformer-VO/timesformer/models/helpers.py", line 161, in load_pretrained
elif num_classes != state_dict[classifier_name + '.weight'].size(0):
KeyError: 'head.weight'

Dataset indexing issues

Hello, thank you very much for providing the code. I encountered the error 'Value Error: Length of values (14860) does not match length of index (14864)' while running train.py. I have not been able to resolve this issue. How can I resolve this issue? @aofrancani
The specific error message is as follows ๏ผš
python train.py
Using CUDA: True
Loading data...
Traceback (most recent call last):
File "train.py", line 223, in
dataset = KITTI(window_size=args["window_size"], overlap=args["overlap"], transform=preprocess)
File "/home/sy/TSformer-VO-main/datasets/kitti.py", line 59, in init
data["frames"] = frames
File "/home/sy/anaconda3/envs/tsformer-vo/lib/python3.8/site-packages/pandas/core/frame.py", line 3044, in setitem
self._set_item(key, value)
File "/home/sy/anaconda3/envs/tsformer-vo/lib/python3.8/site-packages/pandas/core/frame.py", line 3120, in _set_item
value = self._sanitize_column(key, value)
File "/home/sy/anaconda3/envs/tsformer-vo/lib/python3.8/site-packages/pandas/core/frame.py", line 3768, in _sanitize_column
value = sanitize_index(value, self.index)
File "/home/sy/anaconda3/envs/tsformer-vo/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 747, in sanitize_index
raise ValueError(
ValueError: Length of values (14860) does not match length of index (14864)

How can get the value of undo normalization step in plot_results.py

Hi, thanks for the great work. I saw that in the plot_result.py you set

        mean_angles = np.array([1.7061e-5, 9.5582e-4, -5.5258e-5])
        std_angles = np.array([2.8256e-3, 1.7771e-2, 3.2326e-3])
        mean_t = np.array([-8.6736e-5, -1.6038e-2, 9.0033e-1])
        std_t = np.array([2.5584e-2, 1.8545e-2, 3.0352e-1])

How can get these values and is this step necessary in the evaluation stage?

06 sequence

Why is the 06 sequence result so poor? May I ask if there are any improvement measures?

window_sizw

Hello, thank you very much for your multiple replies. I apologize for bothering you again. I have a question. Should the window_size in these three places (train.py and kitti. py) in the picture be the same? When the value is 2, it represents VO1, when the value is 3, it represents VO2, and when the value is 4, it represents VO3. May I ask if this is the understanding?

To achieve results more closely aligned with the paper

Hi. A few days ago, I encountered an error while attempting to run the pretrained_ViT model. I managed to resolve it through another issue. Actually, the reason I attempted to run the pretrained_ViT model was because the results of the non-pretrained model were inconsistent with the results in the paper provided in this GitHub repository. Therefore, after resolving the pretrained issue, I trained the model with pretrained_ViT set to True, and obtained results for sequences 01, 03, 04, 05, 06, 07, and 10 as follows:

pred_traj_01

pred_traj_03

pred_traj_04

pred_traj_05

pred_traj_06

pred_traj_07

pred_traj_10

Here are the settings in train.py:

args = {
    "data_dir": "data",
    "bsize": 4,  # batch size
    "val_split": 0.1,  # percentage to use as validation data
    "window_size": 2,  # number of frames in window
    "overlap": 1,  # number of frames overlapped between windows
    "optimizer": "Adam",  # optimizer [Adam, SGD, Adagrad, RAdam]
    "lr": 1e-5,  # learning rate
    "momentum": 0.9,  # SGD momentum
    "weight_decay": 1e-4,  # SGD momentum
    "epoch": 100,  # train iters each timestep
	"weighted_loss": None,  # float to weight angles in loss function
  	"pretrained_ViT": True,  # load weights from pre-trained ViT
    "checkpoint_path": "checkpoints/Exp_vit_base_2",  # path to save checkpoint
    "checkpoint": None,  # checkpoint
}

# tiny  - patch_size=16, embed_dim=192, depth=12, num_heads=3
# small - patch_size=16, embed_dim=384, depth=12, num_heads=6
# base  - patch_size=16, embed_dim=768, depth=12, num_heads=12
model_params = {
    "dim": 768,
    "image_size": (192, 640),  #(192, 640),
    "patch_size": 16,
    "attention_type": 'divided_space_time',  # ['divided_space_time', 'space_only','joint_space_time', 'time_only']
    "num_frames": args["window_size"],
    "num_classes": 6 * (args["window_size"] - 1),  # 6 DoF for each frame
    "depth": 12,
    "heads": 12,
    "dim_head": 64,
    "attn_dropout": 0.1,
    "ff_dropout": 0.1,
    "time_only": False,
}

The results are similar to the pretrained models provided on GitHub, namely Model1, Model2, and Model3.

It seems like I might have made a mistake somewhere. Could you kindly advise on what I should correct?

Thank you so much

Hello! Can you tell me where the relevant code for 6-Dof pose estimation is in the repository? How it was achieved, thank you for your reply

camera intrinsic parameters

Hi, Thanks for the great work.
It seems that you are not using any of the intrinsic parameters of the camera.
Does the KITTI data set already compensate for that?

GPU

Hello, may I ask where to modify if I want to use GPUs 2 and 3 for training together? I couldn't find it, thank you!

input diemension

Thanks for your work! I tried your code, could you plz tell me the meaning of the input deimension?
torch.Size([4, 3, 2, 192, 640]) batch size * ? * windows_size* witdth*height

modify training data ?

Hello,

I am trying to retrain the model on kitti sequences different from the ones used in the models provided, what would i have to change to set the sequences that are going to be used? In train.py there's a line

"data_dir": "data",

in args, is this supposed to be the training folder? or just the data folder containing the 11 kitti sequences?

I look forward to your answer, cheers!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.