ut-austin-rpl / viola Goto Github PK

View Code? Open in Web Editor NEW

99.0 99.0 6.0 133.11 MB

Official implementation for VIOLA

License: MIT License

Python 100.00%

viola's People

Contributors

Stargazers

Watchers

Forkers

huihanl benjamesbabala icml2023-3740 ciccio42 alexandor91 wuhud

viola's Issues

questions about real dataset

Hello! Thanks for releasing this great works!
I am trying to reproduce this model in the real-world, so i use the viola dataset. I was wondering what is the scale of the dataset ? Such as, the action is absolute or delta? The translation is measaured as "m" or "cm" ? and the rotation is recorded as "rad" or "degree". I find the measures are confusing.
Thanks for answering my questions ~

Great job, but I have a problem, I would really appreciate it if you help me solve it

How to visualize when running the program viola_bc/final_eval_script.py? Where does the offscreen_visualization function in img = offscreen_visualization(env, use_eye_in_hand=cfg.algo.use_eye_in_hand) come from? I didn't find this function

Question about the action Token and image augmentation

action_token_out = transformer_out[:, :, 0, :].

Hello, i don't know why directly take the first dimension of the output as the action_token_out. After your grouping, the grouped input should follow this order: spatial_context_feature + region_feature + action_token + other obs feature. Would the dimension be changed when they pass through the transformer_decoder?

In addition, about the image augmentation (padding + random_crop), how many crops did you take? I saw around the code, only take the default value: num_crops=1. Doesn't the global feature really get lost if there is only one? Because i saw your code, the feature map is extracted from the cropped image.

Could you help me figure out why and how? Thanks a lot

Questions about the action token and inference process

I have some questions:
For the question 1 and 2 (line 168 on here)

    transformer_out = transformer_out.reshape(original_shape)
    action_token_out = transformer_out[:, :, 0, :]
    if per_step:
        action_token_out = action_token_out[:, -1:, :]

Is there any reason you only use index 0 of transformer output?
During inference, you why do you take -1: index? Why do you set different setting for inference?
In your paper, you mentioned action token is used for the input, but I cannot find code where you used action token as input. Can you show where the corresponding code exists?
Can you explain why TensorUtils.time_distributed is used on this line?
During inference, is there a reason why do post-processing for gripper-history?

Thank you in advance!

Could you help me figure out why and how?

real robot data collection

Do you have the scripts to collect the demos on the real robot?

ut-austin-rpl / viola Goto Github PK

viola's People

Contributors

Stargazers

Watchers

Forkers

viola's Issues

questions about real dataset

Great job, but I have a problem, I would really appreciate it if you help me solve it

Question about the action Token and image augmentation

Questions about the action token and inference process

where np.load(f"scenes/{domain_name}/normal/{eval_run_idx}_{i + rank * 50}.npz")

what is initial_mjstate = env.sim.get_state().flatten()

Requesting Real dataset experiment file(.yaml)

Cannot import name 'CropRandomizer' from 'robomimic.models.base_nets'

real robot data collection

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs