tusimple / centerformer Goto Github PK
View Code? Open in Web Editor NEWImplementation for CenterFormer: Center-based Transformer for 3D Object Detection (ECCV 2022)
License: MIT License
Implementation for CenterFormer: Center-based Transformer for 3D Object Detection (ECCV 2022)
License: MIT License
Thanks for your great work! The paper also provides nuScenes results in supplemental materials. May you upload codes for training in the nuScenes dataset? Thanks again.
Hi, i wonder have you ever tried the pillar-based centerformer?
Thanks for your great work! When will open source code?
centerformer/det3d/models/utils/transformer.py
Lines 267 to 279 in 96aa375
Hi,
I'm confused about that why only selected the 0-th data in "example["ind"][0]" in the line , I think there are 6 task head?
Thanks for your great work! I have ever tried CenterPoint based on MMDetection3d. I wonder which parts were changed compared with the orignal CenterPoint.
Looking forward to your early reply. Many thanks!
when I run:
python -m torch.distributed.launch --nproc_per_node=2 ./tools/train.py configs/waymo/voxelnet/waymo_centerformer.py
It shows the following error:
`2022-10-23 14:27:35,879 - INFO - Start running, work_dir: /dkliang/projects/synchronous/centerformer/work_dirs/waymo_centerformer
2022-10-23 14:27:35,880 - INFO - workflow: [('train', 1)], max: 20 epochs
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 0 (pid: 44260) of binary: /dkliang/miniconda3/envs/centerformer/bin/python
Traceback (most recent call last):
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/dkliang/miniconda3/envs/centerformer/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
./tools/train.py FAILED
Other Failures:
[1]:
time: 2022-10-23_14:27:44
rank: 1 (local_rank: 1)
exitcode: -11 (pid: 44261)
error_file: <N/A>
msg: "Signal 11 (SIGSEGV) received by PID 44261"
**************************************************`
Thanks for your great work and opensource, I have a question about the coordinate transformation.
I have checked the code, you have transformed the previous pointclouds to current frame coordinate during training, right?
if sweep["transform_matrix"] is not None:
points_sweep[:3, :] = sweep["transform_matrix"].dot(
np.vstack((points_sweep[:3, :], np.ones(nbr_points)))
)[:3, :]
But in deployment inference, we will just save and use previous featuremap in the memory bank, and the feature map has not transformed to current frame. So there is a gap here.
Please correct me, if I am wrong, thanks!
Hello, really impressive work! I wonder whether I could use the method on other datasets apart from WOD and Nuscenes. I want to use my own dataset for which I already wrote a data pipeline that works. So if I want to make CenterFormer to have the cheering performance on my own dataset, is there any essential change that I should or it's ready for that?
Thank you for open-sourcing your work. I was wondering, why you use x_up(the current frame's bev feature) other than x_up_fuse(the sequential frames through spatial-aware fusion) as center query embedding ? Apologies if I missed it in the paper.
Hi, I succesfully reproduced base centerformer 68.06 in nuscenes.
Thanks a lot.
One thing I have noticed difference from CenterPoint base code is,
you code contains disable_dbsampler
option.
Could you explain what's the motivation of this part? Is it simply turning off augmentation from epoch 15?
Those two lines seem flawless.
But coincidently, I changed the range of PC, which made H unequal to W. I encountered this:
Afte debugging, in get_multi_scale_feature
, center_pos sometimes fell out of the range of feat. After checking backwards, I found some elements of y_coor > W exist. in Line 468.
After some experiments I tried, this problem can be fixed by swaping x_coord and y_coord.
Thanks for you fantastic work. I'm so interested in this project. But I can not get your AP&NDS on nuScenes. Could u upload your training result on nuScenes? I wanna to do some stretching work based on that. Wish your replay~
I ran the code as written on github.
However, after a certain point, the loss is all Nan.
I think it's a loss of the dataset, so I recreated the pkl file with create_data.py, but Nan comes out as it is. Is it correct to run the training to the end even if Nan comes out?
Thanks for releasing the nuscenes dataset code support. I have some questions about the implement of the multi-tasks. I see in the code that you define obj_num=500 for each task and then the task_id will be added to the pos embedding to identify each task in rpn transformer. But unfortunately, the computation increases, and my machine directly throw the error that the cuda memory OOM. As for the implement of multi-task, my intuitive idea is that each task has its own head during the generation of heatmap. Then, all heatmaps are contacted to one tensor and generate top500 center queries, then sent to rpn transformer, Meanwhile, the pos feature is also the regular x and y coordinates. In the final output detection head, each task have their own detection head applying to transformer output features, which can reduce the increasing computation in transformer layer. This is my first thought, I wonder if you has experimented this way, is there any drawbacks? Could you share the effects or conclusions or something like that? It is very important to me. Thank you ~
Hello, in order to reproduce the waymo results by my own, I trained CenterFormer on waymo and tried to get the performance evaluation like that shown in the README:
I follwed the instruction from https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md, I already have the gt.bin and preds.bin of CenterFormer, but I ran into this error:
I wonder whether you encounted this issue before, or maybe I've gone to the wrong way? Really need some help here. Please. Thanks in advance.
Interesting work!
The translation of data aug in CenterForm is 0.5,
https://github.com/TuSimple/centerformer/blob/master/configs/waymo/voxelnet/waymo_centerformer.py#L132, while the translation in CenterPoint is 0. Also, I noticed that you used the np.random.uniform
rather than np.random.normal
like rotation and scale parameters. Could you explain the motivation of these modification and performance influence about them?
It seems like the configuration: "use_rotate_nms = False, use_multi_class_nms = True" cannot remove all redundant boxes and there are still lots of boxes at the same position. Is this normal?
Also, though I set score_threshold = 0.1 in test_cfg, there are lots of boxes with score less than 0.1 in the final output
hello, it is very nice of your work. I try to use it on kitti, but I found it has very poor performance, the Car mAP3D @0.7 is only between 10~20. Have you tried it on KITTI ever?
Has anyone tried torch.cuda.amp?
Seems that ms_attention doesn't support fp16 even after I modified ms_deform_attn_forward_cuda
Any other way to implement amp? Or is there any ways to reduce the GPU memory? I got cuda OOM for bs=4 every time
I want to know whether you know the minimum GPU computing power required, and how many gigabytes
Hello, first of all, thank you for your excellent work. I would like to ask whether there is a trained weight on nuscenes, because my computer can not run training, so I would like to use the trained model to evaluate and see the effect.
By the way, how can I change the batchsize or some other operation to make the GPU demand smaller
If yes, what shape is it in?
thank you for you excellent work, and i want to know waht speed does this mode with 3090 or other latest GPU? My gpu is very poor, so i want to know the speed with a better GPU.I would appreciate it if anyone could answer me .
Thank you for your work. I'm a little confused that since the results shown in Tab1 and Tab4 indicate that the deformable attention does not bring benefits, why do you use it?
Hello,when I execute the setup. sh file, there is an error:
/usr/local/cuda-11.5/bin/nvcc -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include/TH -I/root/anaconda3/envs/centerformer/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda-11.5/include -I/root/anaconda3/envs/centerformer/include/python3.9 -c src/iou3d_nms_kernel.cu -o build/temp.linux-x86_64-cpython-39/src/iou3d_nms_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=iou3d_nms_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/usr/include/stdio.h(189): error: attribute "malloc" does not take arguments
/usr/include/stdio.h(201): error: attribute "malloc" does not take arguments
/usr/include/stdio.h(223): error: attribute "malloc" does not take arguments
/usr/include/stdio.h(260): error: attribute "malloc" does not take arguments
/usr/include/stdio.h(285): error: attribute "malloc" does not take arguments
/usr/include/stdio.h(294): error: attribute "malloc" does not take arguments
/usr/include/stdio.h(303): error: attribute "malloc" does not take arguments
/usr/include/stdio.h(309): error: attribute "malloc" does not take arguments
/usr/include/stdio.h(315): error: attribute "malloc" does not take arguments
/usr/include/stdio.h(830): error: attribute "malloc" does not take arguments
/usr/include/stdlib.h(566): error: attribute "malloc" does not take arguments
/usr/include/stdlib.h(570): error: attribute "malloc" does not take arguments
/usr/include/stdlib.h(799): error: attribute "malloc" does not take arguments
13 errors detected in the compilation of "src/iou3d_nms_kernel.cu".
error: command '/usr/local/cuda-11.5/bin/nvcc' failed with exit code 1
hello, i wonder is there any test in Nuscenes?
Hello,
I would like to know that is there any specific reason for using task_id along with x_coor, y_coor while creating pos_embedding ?
if self.pos_embedding_type == "linear":
if len(self.tasks)>1:
self.pos_embedding = nn.Linear(3, self._num_filters[-1] * 2)
Anyhow we know that 6 task_id ct_feats are concatenated next to each other and are sliced accordingly later in the below code snippet.
for idx, task in enumerate(self.tasks):
out_dict_list[idx]["ct_feat"] = ct_feat[:, :, idx * self.obj_num : (idx+1) * self.obj_num]
what is the purpose of diluting ct_feat dimensions (256) with task_id.
Thanking you in advance.
Hello,
Thanks for the open-source code.
The s_point_list is always empty in my case, the random_crop is set False in https://github.com/TuSimple/centerformer/blob/master/det3d/core/sampler/sample_ops.py#L195, even if set to True, doesn't give me s_points. Also, from the prev. condition check here https://github.com/TuSimple/centerformer/blob/master/det3d/core/sampler/sample_ops.py#L173, the s_points is empty [].
So trying to concatenate an empty array gives me an error.
What could be the issue? I'm trying using the NuScenes mini dataset, I was able to prepare date successfully.
Hi, thanks for sharing code.
I am leaving an issue since I have trouble on running your code.
I run a code without ddp python ./tools/train.py ./configs/nusc/nuscenes_centerformer_separate_detection_head.py
,
sh setup.sh
works nicely. but here is follwing error when running train.py
.
Traceback (most recent call last):
File "./tools/train.py", line 137, in <module>
main()
File "./tools/train.py", line 132, in main
logger=logger,
File "/workspace/det3d/torchie/apis/train.py", line 335, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank)
File "/workspace/det3d/torchie/trainer/trainer.py", line 546, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/workspace/det3d/torchie/trainer/trainer.py", line 413, in train
self.model, data_batch, train_mode=True, **kwargs
File "/workspace/det3d/torchie/trainer/trainer.py", line 371, in batch_processor_inline
losses = model(example, return_loss=True)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/det3d/models/detectors/voxelnet_dynamic.py", line 52, in forward
x, _ = self.extract_feat(example)
File "/workspace/det3d/models/detectors/voxelnet_dynamic.py", line 38, in extract_feat
data['voxels'], data["coors"], data["batch_size"], data["input_shape"]
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/det3d/models/backbones/scn.py", line 156, in forward
x = self.conv_input(ret)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/spconv/modules.py", line 134, in forward
input = module(input)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/spconv/conv.py", line 181, in forward
use_hash=self.use_hash)
File "/opt/conda/lib/python3.7/site-packages/spconv/ops.py", line 95, in get_indice_pairs
int(use_hash))
ValueError: /workplace/spconv/src/spconv/spconv_ops.cc 87
unknown device type
I have tried hard to run your code on nuscenes dataset. We also have 8gpus of A100 settting as you do.
One difference would be that I use docker image.
Here is dockerfile.
FROM pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel
MAINTAINER Junho Cho <[email protected]>
RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-get update
RUN apt-get install git -y
RUN git clone https://github.com/TuSimple/centerformer.git
RUN cd centerformer && pip install -r requirements.txt
RUN apt-get install wget libboost-all-dev libgl1 -y
# Install cmake v3.13.2
RUN apt-get purge -y cmake && \
mkdir /root/temp && \
cd /root/temp && \
wget https://github.com/Kitware/CMake/releases/download/v3.13.2/cmake-3.13.2.tar.gz && \
tar -xzvf cmake-3.13.2.tar.gz && \
cd cmake-3.13.2 && \
bash ./bootstrap && \
make && \
make install && \
cmake --version && \
rm -rf /root/temp
RUN git clone --branch v1.2.1 https://github.com/traveller59/spconv.git --recursive
RUN cd spconv && python setup.py bdist_wheel && cd ./dist && pip install *whl
WORKDIR /workspace
ENV PYTHONPATH="${PYTHONPATH}:/workspace"
Through this dockerfile, we build spconv v1.2.1
on cuda 11.1
and pytorch 1.9.1
environment.
This makes exact pytorch, cuda version as your setting. Only difference is python, but I think is not a big difference. (also tried python 3.9.12, but no luck).
sh setup.sh
always works nicely.
seems following error
ValueError: /root/spconv/src/spconv/spconv_ops.cc 87
unknown device type
might be solved with using other spconv (according to traveller59/spconv#58) , but I have not tried because you specified only spconv 1.2.1
works.
Would there be any idea to sort this issue?
Probably, spconv 1.2.1 does not work in docker accordint to this, but I confirmed spconv 2.2 worked in docker.
If this so, is there any chance this repo be able to support spconv 2.2? (I already tried spconv 2.2 for centerformer and failed a lot)
Hello, first of all, thank you for your work. I have read your paper, do you think it is necessary to fuse image features on Lidar, but at the same time, I also know that the process of image transfer to BEV is time cost, do you think it is necessary (for nuscenes data set), Alternatively, 500 predicted location points can be projected according to calib to obtain the corresponding location neighborhood features of the image for fusion. Do you think these two ways of merging are worth it, or do you have a better way of merging or it doesn't make much sense at the moment.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.