derrickxunu / cobevt Goto Github PK

[CoRL2022] CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

License: Apache License 2.0

Python 99.11% Cython 0.89%

autonomous-driving autonomous-vehicles bev-perception collaborative-perception computer-vision multi-agent-perception nuscenes segmentation semantic semantic-segmentation

cobevt's Introduction

Hi there, I'm Runsheng Xu(徐润生 in Chinese)! 👋

I am a Research Scientist at Waymo, working on the most exciting research projects in Autonomous Driving. I gained my PhD degree from UCLA in 2.5 years, with publications in CVPR/ECCV/ICCV/TPAMI/CoRL/ICRA. I was also a senior deep learning engineer at Mercedes-Benz R&D North America(MBRDNA) and a computer vision engineer at OPPO R&D US from 2018-2020.

🔭 Reseach-wise, I mainly focus on:

Autonomous Driving topics related to Perception, Simulation, and Cooperative Driving Automation
Generative AI topics related to LLM, Diffusion, and World Model.
Computer Vision topics related to Vision Transformer.

😄 I am open to:

collaboration opportunities (anytime & anywhere & any type) and
research internships

📫 Contact me by:

Email (rxx3386 [AT] ucla.edu)
知乎:「叶小飞」
Homepage
Linkedin

cobevt's People

Contributors

Stargazers

Watchers

Forkers

vztu cc-xiaoyu shawcicdd phyllish mypatronsaint ucla-drivex jlqzzz deepchavan1 ucla-mobility tanjingme kaisung0102 yumengxiu jakey-young whuhxb

cobevt's Issues

How to test in the nuscenses?

Excuse me, dear author! There are only train.py and benchmark.py for inference speed.

How long does it take to train the model?

Hello！Thanks for you good work ! could you please explain what GPU you used to train models, and how long does it take?

The code publishement time

Thanks for your novel work, when will you upload the project of CoBEVT? Thank you~

Evaluating IoU on NuScenes Dataset

I am trying to produce IoU values on the NuScenes dataset. I am new to machine learning and was hoping you could provide script to test the model on the nuscenes track. I have been able to set up the dataset and run the benchmark.py file to produce the inference speed, but I am unsure of how I can produce the IoU score included in your paper. -Thank you

Result of opv2v lidar track

Thanks for the work. Can you share how to extract the result of opv2v lidar track (are there codes to extract bounding boxes)?

Attemption of transfering to Detection task

Thank you for sharing the code. I tried to replace the Seg Head of CoBEVT with a simple one-stage detection head. But I found that it hardly work that the regress branch can not to be convergent. So I'd like to ask for you that have ever tried to use the CoBEVT to address detection task? Do you think it's reasonable to work?
I really appreciate that you can send me a reply! Thank you!

How to get the number of dropped cameras result?

Hello! I want to do the ablation study about the dropped cameras, but I have the difficulty about don't know how to change the code to drop the cameras. So could you please tell me how to randomly drop cameras? Thanks for your patient answer .<( _ _ )>

Which threshold do you use?

Thanks for the good work! I see you use CVT for your baseline . So I want to know do you use [email protected] (50%for threshold)to report the acc in your paper?

the calculation of camera_extrinsic

Hi, I noticed the function reform_camera_param in the file BaseDataset involves three variables about extrinsic parameters: camera_extrinsic, camera_extrinsic_to_ego_lidar and camera_extrinsic_to_ego.

I have the following questions:

What's the difference between them?
If I want to fuse the features of lidar and images, which extrinsic should I use, and should I need to change the coordinates from UE4's coordinate system to the standard camera coordinate system like project_3d_to_camera in camera_utils?

Thank you~

Clarification Needed on Fused Axial Attention in FAX module

In the local-global attention block of the CrossViewSwapAttention class, I noticed that there are two rearrange operations applied to the key tensor: From my understanding, these two operations seem to cancel each other out as they appear to reshape the key tensor first into a global feature map and then back into the original window partitioned shape. Could you help explain the purpose of these operations? Why does the key tensor need to be reshaped twice in this way?

    # local-to-local cross-attention
    query = rearrange(query, 'b n d (x w1) (y w2) -> b n x y w1 w2 d',
                      w1=self.q_win_size[0], w2=self.q_win_size[1])  # window partition
    key = rearrange(key, 'b n d (x w1) (y w2) -> b n x y w1 w2 d',
                      w1=self.feat_win_size[0], w2=self.feat_win_size[1])  # window partition
    val = rearrange(val, 'b n d (x w1) (y w2) -> b n x y w1 w2 d',
                      w1=self.feat_win_size[0], w2=self.feat_win_size[1])  # window partition
    query = rearrange(self.cross_win_attend_1(query, key, val,
                                            skip=rearrange(x,
                                                        'b d (x w1) (y w2) -> b x y w1 w2 d',
                                                         w1=self.q_win_size[0], w2=self.q_win_size[1]) if self.skip else None),
                   'b x y w1 w2 d  -> b (x w1) (y w2) d')    # reverse window to feature   全部恢复原来的形状

    query = query + self.mlp_1(self.prenorm_1(query))

    x_skip = query
    query = repeat(query, 'b x y d -> b n x y d', n=n)              # b n x y d

    # local-to-global cross-attention
    query = rearrange(query, 'b n (x w1) (y w2) d -> b n x y w1 w2 d',
                      w1=self.q_win_size[0], w2=self.q_win_size[1])  # window partition
    # Todo: 这不是相互抵消的操作吗?
    key = rearrange(key, 'b n x y w1 w2 d -> b n (x w1) (y w2) d')  # reverse window to feature
    key = rearrange(key, 'b n (w1 x) (w2 y) d -> b n x y w1 w2 d',
                    w1=self.feat_win_size[0], w2=self.feat_win_size[1])  # grid partition
    val = rearrange(val, 'b n x y w1 w2 d -> b n (x w1) (y w2) d')  # reverse window to feature
    val = rearrange(val, 'b n (w1 x) (w2 y) d -> b n x y w1 w2 d',
                    w1=self.feat_win_size[0], w2=self.feat_win_size[1])  # grid partition
    query = rearrange(self.cross_win_attend_2(query,
                                              key,
                                              val,
                                              skip=rearrange(x_skip,
                                                        'b (x w1) (y w2) d -> b x y w1 w2 d',
                                                        w1=self.q_win_size[0],
                                                        w2=self.q_win_size[1])
                                              if self.skip else None),
                   'b x y w1 w2 d  -> b (x w1) (y w2) d')  # reverse grid to feature

the function of com_mask

Hi, thanks for your work, I wonder what's the function of 'com_mask' in opv2v_track_task, and is this variable required?

The related code is
com_mask = mask.unsqueeze(1).unsqueeze(2).unsqueeze( 3) if not self.use_roi_mask \ else get_roi_and_cav_mask(x.shape, mask, transformation_matrix, self.discrete_ratio, self.downsample_rate)

What is the range of detection performance evaluation for OPV2V Camera Track?

Dear Runsheng,
Thank you for your amazing work! I have a minor question when I conduct the experiments. Could you help me address the following issue?
I know the evaluation range is [±140,±40]m for lidar track, but I am not sure whether it is suitable for the camera case in CoBEVT.
For nuScenes, the range is 100m×100m area.
Thank you for your attention!

Dear author, can you provide the checkpoints?

It's too long for me to train once again. Thanks a lot!

FAX local global

为什么计算局部注意力时，需要把特征图变换成 (H/P × W/P, N × P², C) 这个形状，即将P²放在倒数第二个维度？

而计算全局注意力时，则需要把特征图变换成 (N × G², H/G × W/G, C) 这个形状，然后再交换【倒数第二个维度】和【倒数第三个维度】的顺序，即变成 (H/G × W/G, N × G², C)，既然这种形式和局部形式相同，为什么不直接进行相同的变换呢，而是再去额外的交换维度？

issue of reproducing the results

Thanks for sharing the code. I trained the model using the command : CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --use_env opencood/tools/train_camera.py --hypes_yaml opencood/hypes_yaml/opcamera/corpbevt.yaml.
The IoU is about 46.2%. The result reported in the paper is about 60.4%. Do you know why my reproduced result is much lower?

How to modify

I changed the size of the input image from 512x512 to 256x256, but there was a mismatch in the qkv size. I would like to know how to modify it.
Traceback (most recent call last):
File "opencood/tools/train_camera.py", line 241, in
main()
File "opencood/tools/train_camera.py", line 152, in main
ouput_dict = model(batch_data['ego'])
File "/home/cylunbu/anaconda3/envs/cobevt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cylunbu/CoBEVT/opv2v/opencood/models/corpbevt.py", line 114, in forward
x = self.fax(batch_dict)
File "/home/cylunbu/anaconda3/envs/cobevt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cylunbu/CoBEVT/opv2v/opencood/models/sub_modules/fax_modules.py", line 513, in forward
x = cross_view(i, x, self.bev_embedding, feature, I_inv, E_inv)
File "/home/cylunbu/anaconda3/envs/cobevt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cylunbu/CoBEVT/opv2v/opencood/models/sub_modules/fax_modules.py", line 408, in forward
w1=self.q_win_size[0], w2=self.q_win_size[1]) if self.skip else None),
File "/home/cylunbu/anaconda3/envs/cobevt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cylunbu/CoBEVT/opv2v/opencood/models/sub_modules/fax_modules.py", line 208, in forward
assert q_height * q_width == kv_height * kv_width
AssertionError

derrickxunu / cobevt Goto Github PK

cobevt's Introduction

Hi there, I'm Runsheng Xu(徐润生 in Chinese)! 👋

cobevt's People

Contributors

Stargazers

Watchers

Forkers

cobevt's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs