tusimple / centerformer Goto Github PK

Implementation for CenterFormer: Center-based Transformer for 3D Object Detection (ECCV 2022)

License: MIT License

Python 83.46% Shell 0.08% C++ 4.76% Cuda 11.70%

centerformer's Issues

Code for nuScenes Dataset

Thanks for your great work! The paper also provides nuScenes results in supplemental materials. May you upload codes for training in the nuScenes dataset? Thanks again.

Pillar-based centerformer

Hi, i wonder have you ever tried the pillar-based centerformer?

When will open source code

Thanks for your great work! When will open source code?

Question about why the add&norm structure of the tranformer network differ from the typical transformer one

centerformer/det3d/models/utils/transformer.py

Lines 267 to 279 in 96aa375

 if pos_embedding is not None: 

 x_att = self_attn(x + center_pos_embedding) 

 x = x_att + x 

 x_att = cross_attn( 

 x + center_pos_embedding, y + neighbor_pos_embedding 

 ) 

 else: 

 x_att = self_attn(x) 

 x = x_att + x 

 x_att = cross_attn(x, y) 

 x = x_att + x 

 x = ff(x) + x

In the code, the residual in transformer is only the input after add and does not pass through the norm layer. add and norm are not taken as a whole, which is different from the typical transformer structure (the result of add and norm in series as a new level of input). Is there any special consideration for the design here?

Why only selected the 0-th data in "example["ind"][0]", I think there are 6 task head?

Hi,
I'm confused about that why only selected the 0-th data in "example["ind"][0]" in the line , I think there are 6 task head?

为什么只对regression head进行transformer操作？如果对CenterHead也进行transformer操作效果会提升吗？

Implementation of CorssAttention

Hello, I found that you used ChannelAttention and SpatialAttention in your code to replace the cross attention used by the cross attention layer mentioned in original paper, which was done to take into account the computational cost of cross-attention？

Can I reproduce your great work with MMDetection3d？

Thanks for your great work! I have ever tried CenterPoint based on MMDetection3d. I wonder which parts were changed compared with the orignal CenterPoint.
Looking forward to your early reply. Many thanks!

torch.distributed.elastic.multiprocessing.errors.ChildFailedError

when I run:
python -m torch.distributed.launch --nproc_per_node=2 ./tools/train.py configs/waymo/voxelnet/waymo_centerformer.py

It shows the following error:

`2022-10-23 14:27:35,879 - INFO - Start running, work_dir: /dkliang/projects/synchronous/centerformer/work_dirs/waymo_centerformer
2022-10-23 14:27:35,880 - INFO - workflow: [('train', 1)], max: 20 epochs
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 0 (pid: 44260) of binary: /dkliang/miniconda3/envs/centerformer/bin/python
Traceback (most recent call last):
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

         ./tools/train.py FAILED

==================================================
Root Cause:
[0]:
time: 2022-10-23_14:27:44
rank: 0 (local_rank: 0)
exitcode: -11 (pid: 44260)
error_file: <N/A>
msg: "Signal 11 (SIGSEGV) received by PID 44260"

Other Failures:
[1]:
time: 2022-10-23_14:27:44
rank: 1 (local_rank: 1)
exitcode: -11 (pid: 44261)
error_file: <N/A>
msg: "Signal 11 (SIGSEGV) received by PID 44261"
**************************************************`

previous frame transformed to current frame?

Thanks for your great work and opensource, I have a question about the coordinate transformation.
I have checked the code, you have transformed the previous pointclouds to current frame coordinate during training, right?

if sweep["transform_matrix"] is not None:
        points_sweep[:3, :] = sweep["transform_matrix"].dot( 
            np.vstack((points_sweep[:3, :], np.ones(nbr_points)))
        )[:3, :]

But in deployment inference, we will just save and use previous featuremap in the memory bank, and the feature map has not transformed to current frame. So there is a gap here.

Please correct me, if I am wrong, thanks!

Use CenterFormer on other datasets

Hello, really impressive work! I wonder whether I could use the method on other datasets apart from WOD and Nuscenes. I want to use my own dataset for which I already wrote a data pipeline that works. So if I want to make CenterFormer to have the cheering performance on my own dataset, is there any essential change that I should or it's ready for that?

Some details to discuss

Thank you for open-sourcing your work. I was wondering, why you use x_up(the current frame's bev feature) other than x_up_fuse(the sequential frames through spatial-aware fusion) as center query embedding ? Apologies if I missed it in the paper.

issues about testing on nuscenes testset

Thanks for you fantastic work. i wonder how to test the model on nuscenes testset, i've create data for testset and add path to test_anno. when i start testing i run dist_test.py with flag --testset. but it come to error anyway. wish to have your reply.
best regards!

About `disable_dbsampler`

Hi, I succesfully reproduced base centerformer 68.06 in nuscenes.
Thanks a lot.

One thing I have noticed difference from CenterPoint base code is,
you code contains disable_dbsampler option.
Could you explain what's the motivation of this part? Is it simply turning off augmentation from epoch 15?

If the positions "x_coor" and "y_coor" should be swapped in Line 466 and 468 det3d/models/necks/rpn_transformer?

Those two lines seem flawless.
But coincidently, I changed the range of PC, which made H unequal to W. I encountered this:

Afte debugging, in get_multi_scale_feature, center_pos sometimes fell out of the range of feat. After checking backwards, I found some elements of y_coor > W exist. in Line 468.
After some experiments I tried, this problem can be fixed by swaping x_coord and y_coord.

nuScenes result?

Thanks for you fantastic work. I'm so interested in this project. But I can not get your AP&NDS on nuScenes. Could u upload your training result on nuScenes? I wanna to do some stretching work based on that. Wish your replay~

Is it correct that Nan appears in the loss?

I ran the code as written on github.
However, after a certain point, the loss is all Nan.
I think it's a loss of the dataset, so I recreated the pkl file with create_data.py, but Nan comes out as it is. Is it correct to run the training to the end even if Nan comes out?

some questions about nuscenes multi-task support

Thanks for releasing the nuscenes dataset code support. I have some questions about the implement of the multi-tasks. I see in the code that you define obj_num=500 for each task and then the task_id will be added to the pos embedding to identify each task in rpn transformer. But unfortunately, the computation increases, and my machine directly throw the error that the cuda memory OOM. As for the implement of multi-task, my intuitive idea is that each task has its own head during the generation of heatmap. Then, all heatmaps are contacted to one tensor and generate top500 center queries, then sent to rpn transformer, Meanwhile, the pos feature is also the regular x and y coordinates. In the final output detection head, each task have their own detection head applying to transformer output features, which can reduce the increasing computation in transformer layer. This is my first thought, I wonder if you has experimented this way, is there any drawbacks? Could you share the effects or conclusions or something like that? It is very important to me. Thank you ~

Evaluation on waymo opendataset

Hello, in order to reproduce the waymo results by my own, I trained CenterFormer on waymo and tried to get the performance evaluation like that shown in the README:

I follwed the instruction from https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md, I already have the gt.bin and preds.bin of CenterFormer, but I ran into this error:

I wonder whether you encounted this issue before, or maybe I've gone to the wrong way? Really need some help here. Please. Thanks in advance.

global_translate_noise in CenterForm is different from that in CenterPoint.

Interesting work!
The translation of data aug in CenterForm is 0.5,
https://github.com/TuSimple/centerformer/blob/master/configs/waymo/voxelnet/waymo_centerformer.py#L132, while the translation in CenterPoint is 0. Also, I noticed that you used the np.random.uniform rather than np.random.normal like rotation and scale parameters. Could you explain the motivation of these modification and performance influence about them？

waymo coordinates

centerformer/det3d/datasets/waymo/waymo_common.py

Line 269 in 5a949b8

gt_boxes[:, -1] = -np.pi / 2 - gt_boxes[:, -1]

centerformer/det3d/datasets/waymo/waymo_common.py

Line 270 in 5a949b8

gt_boxes[:, [3, 4]] = gt_boxes[:, [4, 3]]

Redundant boxes after post processing

It seems like the configuration: "use_rotate_nms = False, use_multi_class_nms = True" cannot remove all redundant boxes and there are still lots of boxes at the same position. Is this normal?
Also, though I set score_threshold = 0.1 in test_cfg, there are lots of boxes with score less than 0.1 in the final output

CenterFormer on kitti

hello, it is very nice of your work. I try to use it on kitti, but I found it has very poor performance, the Car mAP3D @0.7 is only between 10~20. Have you tried it on KITTI ever?

AUTOMATIC MIXED PRECISION

Has anyone tried torch.cuda.amp?
Seems that ms_attention doesn't support fp16 even after I modified ms_deform_attn_forward_cuda
Any other way to implement amp? Or is there any ways to reduce the GPU memory? I got cuda OOM for bs=4 every time

trainning error about the new released nus

Thank you for releasing the nuscenes dataset support , but when I run trainning, it run into this problem RuntimeError: CUDA error: device-side assert triggered . Is there any array out of bounds problem? But when i debug it, it seems fine.

Minimum configuration requirements

I want to know whether you know the minimum GPU computing power required, and how many gigabytes

Well trained weights on nuscenes

Hello, first of all, thank you for your excellent work. I would like to ask whether there is a trained weight on nuscenes, because my computer can not run training, so I would like to use the trained model to evaluate and see the effect.

By the way, how can I change the batchsize or some other operation to make the GPU demand smaller

Are the x, y, z values included in the value of the variable center_pos?

If yes, what shape is it in?

python3.9不支持pillow6.2.1

requirements中要求pillow版本不高于6.2.1，而Install里的环境是在python3.9.12测试的，可Python3.9不支持安装6.2.1的pillow版本

waht speed does this mode with 3090 or other latest GPU?

thank you for you excellent work, and i want to know waht speed does this mode with 3090 or other latest GPU? My gpu is very poor, so i want to know the speed with a better GPU.I would appreciate it if anyone could answer me .

The effect of deformable attention

Thank you for your work. I'm a little confused that since the results shown in Tab1 and Tab4 indicate that the deformable attention does not bring benefits, why do you use it?

/usr/include/stdio.h(189): error: attribute "malloc" does not take arguments

Hello，when I execute the setup. sh file, there is an error：

/usr/local/cuda-11.5/bin/nvcc -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include/TH -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda-11.5/include -I/root/anaconda3/envs/centerformer/include/python3.9 -c src/iou3d_nms_kernel.cu -o build/temp.linux-x86_64-cpython-39/src/iou3d_nms_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=iou3d_nms_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/usr/include/stdio.h(189): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(201): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(223): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(260): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(285): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(294): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(303): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(309): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(315): error: attribute "malloc" does not take arguments

/usr/include/stdio.h(830): error: attribute "malloc" does not take arguments

/usr/include/stdlib.h(566): error: attribute "malloc" does not take arguments

/usr/include/stdlib.h(570): error: attribute "malloc" does not take arguments

/usr/include/stdlib.h(799): error: attribute "malloc" does not take arguments

13 errors detected in the compilation of "src/iou3d_nms_kernel.cu".
error: command '/usr/local/cuda-11.5/bin/nvcc' failed with exit code 1

Is there any test in Nuscenes

hello, i wonder is there any test in Nuscenes?

Positional embedding in RPN_transformer_deformable_multitask

Hello,

I would like to know that is there any specific reason for using task_id along with x_coor, y_coor while creating pos_embedding ?

    if self.pos_embedding_type == "linear":
        if len(self.tasks)>1:
            self.pos_embedding = nn.Linear(3, self._num_filters[-1] * 2)

Anyhow we know that 6 task_id ct_feats are concatenated next to each other and are sliced accordingly later in the below code snippet.

    for idx, task in enumerate(self.tasks):
        out_dict_list[idx]["ct_feat"] = ct_feat[:, :, idx * self.obj_num : (idx+1) * self.obj_num]

what is the purpose of diluting ct_feat dimensions (256) with task_id.

Thanking you in advance.

Issue in points np.concatenate(s_points_list, axis=0) in centerformer-master/det3d/core/sampler/sample_ops.py

Hello,
Thanks for the open-source code.
The s_point_list is always empty in my case, the random_crop is set False in https://github.com/TuSimple/centerformer/blob/master/det3d/core/sampler/sample_ops.py#L195, even if set to True, doesn't give me s_points. Also, from the prev. condition check here https://github.com/TuSimple/centerformer/blob/master/det3d/core/sampler/sample_ops.py#L173, the s_points is empty [].
So trying to concatenate an empty array gives me an error.
What could be the issue? I'm trying using the NuScenes mini dataset, I was able to prepare date successfully.

`ValueError: /workplace/spconv/src/spconv/spconv_ops.cc 87 unknown device type` error

Hi, thanks for sharing code.
I am leaving an issue since I have trouble on running your code.
I run a code without ddp python ./tools/train.py ./configs/nusc/nuscenes_centerformer_separate_detection_head.py,
sh setup.sh works nicely. but here is follwing error when running train.py.

Traceback (most recent call last):
  File "./tools/train.py", line 137, in <module>
    main()
  File "./tools/train.py", line 132, in main
    logger=logger,
  File "/workspace/det3d/torchie/apis/train.py", line 335, in train_detector
    trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
  File "/workspace/det3d/torchie/trainer/trainer.py", line 546, in run
    epoch_runner(data_loaders[i], self.epoch, **kwargs)
  File "/workspace/det3d/torchie/trainer/trainer.py", line 413, in train
    self.model, data_batch, train_mode=True, **kwargs
  File "/workspace/det3d/torchie/trainer/trainer.py", line 371, in batch_processor_inline
    losses = model(example, return_loss=True)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/det3d/models/detectors/voxelnet_dynamic.py", line 52, in forward
    x, _ = self.extract_feat(example)
  File "/workspace/det3d/models/detectors/voxelnet_dynamic.py", line 38, in extract_feat
    data['voxels'], data["coors"], data["batch_size"], data["input_shape"]
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/det3d/models/backbones/scn.py", line 156, in forward
    x = self.conv_input(ret)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/spconv/modules.py", line 134, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/spconv/conv.py", line 181, in forward
    use_hash=self.use_hash)
  File "/opt/conda/lib/python3.7/site-packages/spconv/ops.py", line 95, in get_indice_pairs
    int(use_hash))
ValueError: /workplace/spconv/src/spconv/spconv_ops.cc 87
unknown device type

I have tried hard to run your code on nuscenes dataset. We also have 8gpus of A100 settting as you do.
One difference would be that I use docker image.
Here is dockerfile.

FROM pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel
MAINTAINER Junho Cho <[email protected]>

RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-get update

RUN apt-get install git -y
RUN git clone https://github.com/TuSimple/centerformer.git

RUN cd centerformer && pip install -r requirements.txt

RUN apt-get install wget libboost-all-dev libgl1 -y

# Install cmake v3.13.2
RUN apt-get purge -y cmake && \
    mkdir /root/temp && \
    cd /root/temp && \
    wget https://github.com/Kitware/CMake/releases/download/v3.13.2/cmake-3.13.2.tar.gz && \
    tar -xzvf cmake-3.13.2.tar.gz && \
    cd cmake-3.13.2 && \
    bash ./bootstrap && \
    make && \
    make install && \
    cmake --version && \
    rm -rf /root/temp

RUN git clone --branch v1.2.1  https://github.com/traveller59/spconv.git --recursive
RUN cd spconv && python setup.py bdist_wheel && cd ./dist && pip install *whl

WORKDIR /workspace
ENV PYTHONPATH="${PYTHONPATH}:/workspace"

Through this dockerfile, we build spconv v1.2.1 on cuda 11.1 and pytorch 1.9.1 environment.
This makes exact pytorch, cuda version as your setting. Only difference is python, but I think is not a big difference. (also tried python 3.9.12, but no luck).

sh setup.sh always works nicely.

seems following error

ValueError: /root/spconv/src/spconv/spconv_ops.cc 87
unknown device type

might be solved with using other spconv (according to traveller59/spconv#58) , but I have not tried because you specified only spconv 1.2.1 works.

Would there be any idea to sort this issue?

Probably, spconv 1.2.1 does not work in docker accordint to this, but I confirmed spconv 2.2 worked in docker.

If this so, is there any chance this repo be able to support spconv 2.2? (I already tried spconv 2.2 for centerformer and failed a lot)

About Lidar and image fusion

Hello, first of all, thank you for your work. I have read your paper, do you think it is necessary to fuse image features on Lidar, but at the same time, I also know that the process of image transfer to BEV is time cost, do you think it is necessary (for nuscenes data set), Alternatively, 500 predicted location points can be projected according to calib to obtain the corresponding location neighborhood features of the image for fusion. Do you think these two ways of merging are worth it, or do you have a better way of merging or it doesn't make much sense at the moment.

	if pos_embedding is not None:
	x_att = self_attn(x + center_pos_embedding)
	x = x_att + x
	x_att = cross_attn(
	x + center_pos_embedding, y + neighbor_pos_embedding
	)
	else:
	x_att = self_attn(x)
	x = x_att + x
	x_att = cross_attn(x, y)

	x = x_att + x
	x = ff(x) + x

tusimple / centerformer Goto Github PK

centerformer's Issues

================================================== Root Cause: [0]: time: 2022-10-23_14:27:44 rank: 0 (local_rank: 0) exitcode: -11 (pid: 44260) error_file: <N/A> msg: "Signal 11 (SIGSEGV) received by PID 44260"

Recommend Projects

Recommend Topics

Recommend Org

Jobs

==================================================
Root Cause:
[0]:
time: 2022-10-23_14:27:44
rank: 0 (local_rank: 0)
exitcode: -11 (pid: 44260)
error_file: <N/A>
msg: "Signal 11 (SIGSEGV) received by PID 44260"